Using high performance computing for unrelated parallel machine scheduling with sequence-dependent setup times: Development and computational evaluation of a parallel branch-and-price algorithm

https://doi.org/10.1016/j.cor.2018.12.020Get rights and content

Highlights

  • A branch-and-price algorithm for parallel machine scheduling is proposed.

  • The algorithm is parallelized using a master/worker approach.

  • Extensive computational studies on parallel scalability have been conducted.

  • Execution times can be reduced substantially by our parallelization approach.

Abstract

Scheduling problems are essential for decision making in many academic disciplines, including operations management, computer science, and information systems. Since many scheduling problems are NP-hard in the strong sense, there is only limited research on exact algorithms and how their efficiency scales when implemented on parallel computing architectures. We address this gap by (1) adapting an exact branch-and-price algorithm to a parallel machine scheduling problem on unrelated machines with sequence- and machine-dependent setup times, (2) parallelizing the adapted algorithm by implementing a distributed-memory parallelization with a master/worker approach, and (3) conducting extensive computational experiments using up to 960 MPI processes on a modern high performance computing cluster. With our experiments, we show that the efficiency of our parallelization approach can lead to superlinear speedup but can vary substantially between instances. We further show that the wall time of serial execution can be substantially reduced through our parallelization, in some cases from 94 h to less than 6 min when our algorithm is executed on 960 processes.

Introduction

In this article, we study the parallel machine scheduling problem on unrelated machines with sequence- and machine-dependent setup times, machine eligibility restrictions, and the total weighted completion time as objective function. This problem is well-known in the scheduling literature and classified as R/sijk, Mj/∑wjCj (Pinedo, 2012) in the established α/β/γ-notation (Graham et al., 1979). For convenience, we refer to this problem as UPMSP (Unrelated Parallel Machine Scheduling Problem) in the remainder of this article. UPMSP is NP-hard in the strong sense since the more specific problem P//∑wjCj of minimizing the total weighted completion time on identical machines is NP-hard in the strong sense (Skutella and Woeginger, 2000).

The problem can be described as follows: a set of jobs has to be processed on a set of machines, where (i) each job has to be processed exactly once, (ii) the processing of a job must not be interrupted (non-preemption), and (iii) a machine may be capable of processing a job or not (machine eligibility restrictions). Processing times depend on the job and the processing machine, while sequence-dependent setup times depend on the job, the preceding job, and the processing machine. Furthermore, each job has a priority level (weight). The goal of UPMSP is to find a feasible set of machine schedules with minimal total weighted completion time.

There are several real-life settings where decision makers face UPMSP. In disaster response, rescue units (machines) are scheduled to process emergency incidents (jobs) with different priorities (weights). Setups are required by rescue units for traveling between incidents’ locations (Wex et al., 2014). Another application are traffic flow networks where repairmen (machines) have to repair broken toll plazas or toll bridges (jobs) with different traffic throughput rates (weights). Setups are represented by travel times of repairmen between toll plazas or bridges (Weng et al., 2001). UPMSP is also found in injection molding departments where machines are used to produce different components (jobs) with certain importance (weights) and for which setup times are required for dies or molds (Chen, 2015).

Solving medium- or large-scale instances of UPMSP—and even of the more specific problem where no machine eligibility restrictions apply and where setup times are not machine-dependent—to optimality is computationally challenging as recent studies show (Arnaout, Rabadi, 2005, Chen, 2015, Rauchecker, Schryen, 2015, Schryen, Rauchecker, Comes, 2015, Tsai, Tseng, 2007, Weng, Lu, Ren, 2001, Wex, Schryen, Feuerriegel, Neumann, 2014). In order to overcome efficiency limitations due to sequentially executed algorithms, researchers can rely on recent technological developments in high performance computing (HPC), which refers to the use of parallel computing architectures. HPC is particularly relevant since speed improvement on a single core is limited because of technological reasons (Hager and Wellein, 2010, p. 23). Modern PCs and even smartphones have multiple cores, which allow for parallel code execution. At the extreme, computer clusters and supercomputers—containing up to several millions of cores—are pushing the boundaries of HPC (TOP500, 2017). HPC has been successfully applied to a broad range of problems in many scientific disciplines, including biology, chemistry, physics, geology, weather forecasting, aerodynamic research, and computer science (Bell, Gray, 2002, Vecchiola, Pandey, Buyya, 2009)—but there is little research on using HPC for scheduling problems. Nowadays, taking advantage of HPC does not require having access to a supercomputer; it can also be done on computing clusters, which have become commodity IT resources. For example, they are available at many universities and are provided by some cloud providers, for example, as part of the Amazon Web Services (Mauch et al., 2013). To sum up, HPC has not only become technologically feasible but also economically affordable (Hager and Wellein, 2010, p. 1).

However, in order to exploit the capabilities of HPC, algorithms have to be parallelized. In this work, we adapt a serial branch-and-price algorithm, which was suggested by Lopes and de Carvalho (2007), to solve UPMSP. Their algorithm was designed for the parallel machine scheduling problem on unrelated machines with sequence-dependent setup times, machine availability dates, release dates, due dates, and the total weighted tardiness as objective function. We suggest an algorithmic parallelization of the adapted b&p algorithm and conduct extensive computational experiments on an HPC cluster to analyze the scalability of our parallel approach on a large number of cores.

Scheduling problems appear in many forms and have attracted thousands of research papers. In order to structure this large body of research, comprehensive literature reviews (e.g., Allahverdi, 2015, Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008, Cheng, Sin, 1990) and books (e.g., Brucker, 2007, Pinedo, 2012, Rabadi, 2016) have been published. We can divide scheduling problems into problems which account for setup times and those which do not. Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008 and Allahverdi (2015) provide comprehensive surveys about all scheduling problems accounting for setup times. They further divide these into problems with sequence-independent setup times and problems with sequence-dependent setup times. In our overview, we focus on problems that can be represented as R/STSD/γ, i.e., scheduling on unrelated parallel machines with sequence-dependent setup times. We further restrict γ to objective functions that are at least as general as the total weighted completion time.

According to Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008 and Allahverdi (2015), the first research focusing on this type of problems was conducted by Zhu and Heady (2000), who considered due dates and the total weighted earliness/tardiness (wjEj+wjTj) objective function, which is equivalent to the total weighted completion time when all earliness weights and due dates are 0. They modeled R/STSD/wjEj+wjTj by a mixed-integer program (MIP) and were capable of finding optimal solutions for up to 9 jobs and 3 machines. Akyol and Bayhan (2008) present an exact Artificial Neural Network algorithm but they were not able to solve larger instance sizes. A Tabu Search was presented by Bozorgirad and Logendran (2012) while Zeidi and Mohammad Hosseini (2015) have chosen a hybrid Genetic Algorithm/Simulated Annealing approach.

In the absence of earliness weights, wjEj+wjTj turns into the total weighted tardiness objective function (∑wjTj), which is equivalent to the total weighted completion time when all due dates are 0. Tavakkoli-Moghaddam and Aramon-Bajestani (2009), Lopes and de Carvalho (2007), and Lopes et al. (2014) present branch-and-bound (b&b) and branch-and-price (b&p) algorithms based on MIP formulations for R/STSD/∑wjTj, with the b&b algorithm being capable of solving instances with up to 10 jobs and 4 machines and the b&p algorithms being capable of solving instances with up to 180 jobs and 50 machines. The total weighted tardiness objective was further tackled with Genetic Algorithms (Joo and Kim, 2012), Tabu Search (Logendran et al., 2007), Simulated Annealing (Kim et al., 2003), and several other heuristic approaches (Alvelos, Lopes, Lopes, 2016, de Paula, Mateus, Ravetti, 2010, Lin, Hsieh, 2014, Rauchecker, Schryen, 2015, Zhang, Zheng, Weng, 2007). Our problem R/STSD/∑wjCj has been formulated as a quadratic binary program and tested with the off-the-shelf solver Gurobi by Wex et al. (2014) and Schryen et al. (2015). Their computational studies indicate that this formulation and strategy is not efficient as it fails to compute optimal solutions for small-sized instances consisting of 40 jobs and 10 machines within several hours. Other approaches approximate this problem with Genetic Algorithms (Tsai and Tseng, 2007), b&p based heuristics (Rauchecker and Schryen, 2015), and several problem-specific heuristics (Arnaout, Rabadi, 2005, Chen, 2015, Weng, Lu, Ren, 2001).

The common problem of (meta-) heuristic approaches is the fact that they usually cannot guarantee optimality. Consequently, efficient exact algorithms are not only desirable for solving large real-world problems to optimality, but also they are an important tool for benchmarking (meta-) heuristics. However, our literature review shows that (i) in contrast to heuristics, only very few research on exact algorithms for UPMSP and related problems exists and (ii) none of the suggested exact algorithms has been parallelized in order to leverage the potential of modern HPC capabilities. However, the most promising of the presented exact approaches use b&b and especially b&p algorithms, which offer a high potential for parallelization (e.g., Eckstein, 1994, Migdalas, Pardalos, Storøy, 2013; or Ralphs et al., 2003). B&p algorithms were conceptualized by Barnhart et al. (1998) and use both a b&b algorithm and a column generation procedure (Dantzig, Wolfe, 1960, Desaulniers, Desrosiers, Solomon, 2006, Lübbecke, Desrosiers, 2005) to solve integer programs with many variables. This kind of algorithm has widely been used for tackling scheduling problems, including studies by Bard and Rojanasoonthon (2006), Fei et al. (2008), van den Akker et al. (1999), and Chen, Powell, 1999, Chen, Powell, 2003. However, only a few studies use parallel implementations of b&b or b&p algorithms to tackle scheduling problems, see, for instance, Perregaard and Clausen (1998), Clausen and Perregaard (1999), Crespo Abril and Maroto Alvarez (2005), Aitzai and Boudhar (2013), and Chakroun et al. (2013). Just recently, a new parallel b&p-based heuristic for UPMSP has been suggested (Rauchecker and Schryen, 2015). We contribute to closing the aforementioned research gap by suggesting and computationally validating an exact parallel b&p algorithm for UPMSP as outlined in Section 1.2.

In Section 2, we present the mathematical formulation of our scheduling problem. In Section 3, we propose an adaptation of the serial b&p algorithm suggested by Lopes and de Carvalho (2007) to UPMSP. Section 4 presents a parallelized version of the adapted b&p algorithm. To the best knowledge of the authors, this is the first time that an exact b&p algorithm for a scheduling problem has been parallelized. In Section 5, we demonstrate the applicability of the parallelized algorithm on a Linux-based HPC cluster with extensive numerical experiments and measure its performance using established scalability metrics. With our experiments, we show that our parallelization approach achieves high efficiencies with even superlinear speedups for some instances. We further show that the wall time of our tested instances is reduced from up to 94 h for the most difficult test instance in serial execution to less than 6 min when the algorithm is executed on 960 processes. We discuss those finding in Section 6 before we finally conclude in Section 7.

Section snippets

Problem formulation

We presented an overview of articles on problems which are at least as general as UPSMP in Section 1.1. Regarding exact algorithms, the computational results from Lopes and de Carvalho (2007) and Lopes et al. (2014) for the scheduling problem R/STSD/∑wjTj are most promising in terms of efficiency and, in addition, they solve their model by a highly parallelizable b&p algorithm. Consequently, we adapt their binary linear formulation to UPMSP. We introduce the notation for our formulation in the

Serial branch-and-price algorithm for UPMSP

In this section, we present and discuss an adaptation of a b&p algorithm—introduced by Lopes and de Carvalho (2007) for the problem R/STSD/∑wjTj—to UPMSP. A b&p algorithm is a specific form of a b&b algorithm where all linear relaxations are solved by a column generation procedure (Dantzig and Wolfe, 1960). A first overview on solving integer programs (IPs) with b&p algorithms was provided by Barnhart et al. (1998). A high-level pseudo code of a b&p algorithm that solves general IPs is

Parallelization of the branch-and-price algorithm

In this section, we explain how we parallelize our b&p algorithm from Section 3. According to Gendron and Crainic (1994), there are three types of parallelism for b&b algorithms: first, the solution of single b&b nodes can be executed in parallel (type 1). Second, the b&b tree itself can be parallelized by solving concurrently active nodes of the b&b tree simultaneously (type 2). Third, multiple b&b trees can be built and explored in parallel (type 3). It is also possible to combine those types.

Computational experiments

In this section, we report the results of an extensive computational study to test the efficiency of the parallelized b&p algorithm from Section 4 when it is executed using multiple processes in an HPC environment.

Scalability in experiments of the first type

From the speedup curves in Figs. 2 and 3, we can see that the average speedup and the maximum speedup can differ substantially within instance sizes. This is driven by the fact that serial wall times differ substantially within our instance sizes as the coefficients of variation (CV) for the wall times lie between 0.9 and 1.8, see Table 3. The size of the b&b tree, which is another indicator for parallelization potential, varies to the same extent with CVs between 0.8 and 1.9, see Table D.8. It

Conclusions

In this paper, we adapt a b&p algorithm, which was originally developed by Lopes and de Carvalho (2007), to the strongly NP-hard scheduling problem R/sijk, Mj/∑wjCj. We suggest, implement, and computationally validate a master/worker parallelization strategy for the adapted algorithm, thereby bridging the gap between the largely unconnected fields of scheduling problems and HPC. We use multiple processes to solve concurrently active nodes of the b&b tree simultaneously. Our computational

Acknowledgments

This research has been funded by the Federal Ministry of Education and Research of Germany in the framework of KUBAS (project number 13N13942). Simulations were performed with computing resources granted by RWTH Aachen University under project prep0011.

References (56)

  • V. Mauch et al.

    High performance cloud computing

    Future Gener. Comput. Syst.

    (2013)
  • M.R. de Paula et al.

    A non-delayed relax-and-cut algorithm for scheduling problems with parallel machines, due dates and sequence-dependent setup times

    Comput. Oper. Res.

    (2010)
  • M.X. Weng et al.

    Unrelated parallel machine scheduling with setup consideration and a total weighted completion time objective

    Int. J. Prod. Econ.

    (2001)
  • F. Wex et al.

    Emergency response in natural disaster management: allocation and scheduling of rescue units

    Eur. J. Oper. Res.

    (2014)
  • Z. Zhu et al.

    Minimizing the sum of earliness/tardiness in multi-machine scheduling: a mixed integer programming approach

    Comput. Ind. Eng.

    (2000)
  • A. Aitzai et al.

    Parallel branch-and-bound and parallel PSO algorithms for job shop scheduling problem with blocking

    Int. J. Oper. Res.

    (2013)
  • D.E. Akyol et al.

    Multi-machine earliness and tardiness scheduling problem: an interconnected neural network approach

    Int. J. Adv. Manuf. Technol.

    (2008)
  • J.M. van den Akker et al.

    Parallel machine scheduling by column generation

    Oper. Res.

    (1999)
  • F. Alvelos et al.

    A matheuristic based on column generation for parallel machine scheduling with sequence dependent setup times

    Computational Management Science

    (2016)
  • J.-P.M. Arnaout et al.

    Minimizing the total weighted completion time on unrelated parallel machines with stochastic times

    Proceedings of the IEEE Winter Simulation Conference

    (2005)
  • J.F. Bard et al.

    A branch-and-price algorithm for parallel machine scheduling with time windows and job priorities

    Nav. Res. Logist.

    (2006)
  • C. Barnhart et al.

    Branch-and-price: column generation for solving huge integer programs

    Oper. Res.

    (1998)
  • G. Bell et al.

    What’s next in high-performance computing?

    Commun. ACM

    (2002)
  • A. Borisenko et al.

    Optimal design of multi-product batch plants using a parallel branch-and-bound method

    International Conference on Parallel Computing Technologies

    (2011)
  • P. Brucker

    Scheduling Algorithms

    (2007)
  • J.-F. Chen

    Unrelated parallel-machine scheduling to minimize total weighted completion time

    J. Intell. Manuf.

    (2015)
  • Z.-L. Chen et al.

    Solving parallel machine scheduling problems by column generation

    INFORMS J. Comput.

    (1999)
  • Z.-L. Chen et al.

    Exact algorithms for scheduling multiple families of jobs on parallel machines

    Nav. Res. Logist.

    (2003)
  • Cited by (0)

    View full text