Using high performance computing for unrelated parallel machine scheduling with sequence-dependent setup times: Development and computational evaluation of a parallel branch-and-price algorithm
Introduction
In this article, we study the parallel machine scheduling problem on unrelated machines with sequence- and machine-dependent setup times, machine eligibility restrictions, and the total weighted completion time as objective function. This problem is well-known in the scheduling literature and classified as R/sijk, Mj/∑wjCj (Pinedo, 2012) in the established α/β/γ-notation (Graham et al., 1979). For convenience, we refer to this problem as UPMSP (Unrelated Parallel Machine Scheduling Problem) in the remainder of this article. UPMSP is NP-hard in the strong sense since the more specific problem P//∑wjCj of minimizing the total weighted completion time on identical machines is NP-hard in the strong sense (Skutella and Woeginger, 2000).
The problem can be described as follows: a set of jobs has to be processed on a set of machines, where (i) each job has to be processed exactly once, (ii) the processing of a job must not be interrupted (non-preemption), and (iii) a machine may be capable of processing a job or not (machine eligibility restrictions). Processing times depend on the job and the processing machine, while sequence-dependent setup times depend on the job, the preceding job, and the processing machine. Furthermore, each job has a priority level (weight). The goal of UPMSP is to find a feasible set of machine schedules with minimal total weighted completion time.
There are several real-life settings where decision makers face UPMSP. In disaster response, rescue units (machines) are scheduled to process emergency incidents (jobs) with different priorities (weights). Setups are required by rescue units for traveling between incidents’ locations (Wex et al., 2014). Another application are traffic flow networks where repairmen (machines) have to repair broken toll plazas or toll bridges (jobs) with different traffic throughput rates (weights). Setups are represented by travel times of repairmen between toll plazas or bridges (Weng et al., 2001). UPMSP is also found in injection molding departments where machines are used to produce different components (jobs) with certain importance (weights) and for which setup times are required for dies or molds (Chen, 2015).
Solving medium- or large-scale instances of UPMSP—and even of the more specific problem where no machine eligibility restrictions apply and where setup times are not machine-dependent—to optimality is computationally challenging as recent studies show (Arnaout, Rabadi, 2005, Chen, 2015, Rauchecker, Schryen, 2015, Schryen, Rauchecker, Comes, 2015, Tsai, Tseng, 2007, Weng, Lu, Ren, 2001, Wex, Schryen, Feuerriegel, Neumann, 2014). In order to overcome efficiency limitations due to sequentially executed algorithms, researchers can rely on recent technological developments in high performance computing (HPC), which refers to the use of parallel computing architectures. HPC is particularly relevant since speed improvement on a single core is limited because of technological reasons (Hager and Wellein, 2010, p. 23). Modern PCs and even smartphones have multiple cores, which allow for parallel code execution. At the extreme, computer clusters and supercomputers—containing up to several millions of cores—are pushing the boundaries of HPC (TOP500, 2017). HPC has been successfully applied to a broad range of problems in many scientific disciplines, including biology, chemistry, physics, geology, weather forecasting, aerodynamic research, and computer science (Bell, Gray, 2002, Vecchiola, Pandey, Buyya, 2009)—but there is little research on using HPC for scheduling problems. Nowadays, taking advantage of HPC does not require having access to a supercomputer; it can also be done on computing clusters, which have become commodity IT resources. For example, they are available at many universities and are provided by some cloud providers, for example, as part of the Amazon Web Services (Mauch et al., 2013). To sum up, HPC has not only become technologically feasible but also economically affordable (Hager and Wellein, 2010, p. 1).
However, in order to exploit the capabilities of HPC, algorithms have to be parallelized. In this work, we adapt a serial branch-and-price algorithm, which was suggested by Lopes and de Carvalho (2007), to solve UPMSP. Their algorithm was designed for the parallel machine scheduling problem on unrelated machines with sequence-dependent setup times, machine availability dates, release dates, due dates, and the total weighted tardiness as objective function. We suggest an algorithmic parallelization of the adapted b&p algorithm and conduct extensive computational experiments on an HPC cluster to analyze the scalability of our parallel approach on a large number of cores.
Scheduling problems appear in many forms and have attracted thousands of research papers. In order to structure this large body of research, comprehensive literature reviews (e.g., Allahverdi, 2015, Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008, Cheng, Sin, 1990) and books (e.g., Brucker, 2007, Pinedo, 2012, Rabadi, 2016) have been published. We can divide scheduling problems into problems which account for setup times and those which do not. Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008 and Allahverdi (2015) provide comprehensive surveys about all scheduling problems accounting for setup times. They further divide these into problems with sequence-independent setup times and problems with sequence-dependent setup times. In our overview, we focus on problems that can be represented as R/STSD/γ, i.e., scheduling on unrelated parallel machines with sequence-dependent setup times. We further restrict γ to objective functions that are at least as general as the total weighted completion time.
According to Allahverdi, Gupta, Aldowaisan, 1999, Allahverdi, Ng, Cheng, Kovalyov, 2008 and Allahverdi (2015), the first research focusing on this type of problems was conducted by Zhu and Heady (2000), who considered due dates and the total weighted earliness/tardiness () objective function, which is equivalent to the total weighted completion time when all earliness weights and due dates are 0. They modeled by a mixed-integer program (MIP) and were capable of finding optimal solutions for up to 9 jobs and 3 machines. Akyol and Bayhan (2008) present an exact Artificial Neural Network algorithm but they were not able to solve larger instance sizes. A Tabu Search was presented by Bozorgirad and Logendran (2012) while Zeidi and Mohammad Hosseini (2015) have chosen a hybrid Genetic Algorithm/Simulated Annealing approach.
In the absence of earliness weights, turns into the total weighted tardiness objective function (∑wjTj), which is equivalent to the total weighted completion time when all due dates are 0. Tavakkoli-Moghaddam and Aramon-Bajestani (2009), Lopes and de Carvalho (2007), and Lopes et al. (2014) present branch-and-bound (b&b) and branch-and-price (b&p) algorithms based on MIP formulations for R/STSD/∑wjTj, with the b&b algorithm being capable of solving instances with up to 10 jobs and 4 machines and the b&p algorithms being capable of solving instances with up to 180 jobs and 50 machines. The total weighted tardiness objective was further tackled with Genetic Algorithms (Joo and Kim, 2012), Tabu Search (Logendran et al., 2007), Simulated Annealing (Kim et al., 2003), and several other heuristic approaches (Alvelos, Lopes, Lopes, 2016, de Paula, Mateus, Ravetti, 2010, Lin, Hsieh, 2014, Rauchecker, Schryen, 2015, Zhang, Zheng, Weng, 2007). Our problem R/STSD/∑wjCj has been formulated as a quadratic binary program and tested with the off-the-shelf solver Gurobi by Wex et al. (2014) and Schryen et al. (2015). Their computational studies indicate that this formulation and strategy is not efficient as it fails to compute optimal solutions for small-sized instances consisting of 40 jobs and 10 machines within several hours. Other approaches approximate this problem with Genetic Algorithms (Tsai and Tseng, 2007), b&p based heuristics (Rauchecker and Schryen, 2015), and several problem-specific heuristics (Arnaout, Rabadi, 2005, Chen, 2015, Weng, Lu, Ren, 2001).
The common problem of (meta-) heuristic approaches is the fact that they usually cannot guarantee optimality. Consequently, efficient exact algorithms are not only desirable for solving large real-world problems to optimality, but also they are an important tool for benchmarking (meta-) heuristics. However, our literature review shows that (i) in contrast to heuristics, only very few research on exact algorithms for UPMSP and related problems exists and (ii) none of the suggested exact algorithms has been parallelized in order to leverage the potential of modern HPC capabilities. However, the most promising of the presented exact approaches use b&b and especially b&p algorithms, which offer a high potential for parallelization (e.g., Eckstein, 1994, Migdalas, Pardalos, Storøy, 2013; or Ralphs et al., 2003). B&p algorithms were conceptualized by Barnhart et al. (1998) and use both a b&b algorithm and a column generation procedure (Dantzig, Wolfe, 1960, Desaulniers, Desrosiers, Solomon, 2006, Lübbecke, Desrosiers, 2005) to solve integer programs with many variables. This kind of algorithm has widely been used for tackling scheduling problems, including studies by Bard and Rojanasoonthon (2006), Fei et al. (2008), van den Akker et al. (1999), and Chen, Powell, 1999, Chen, Powell, 2003. However, only a few studies use parallel implementations of b&b or b&p algorithms to tackle scheduling problems, see, for instance, Perregaard and Clausen (1998), Clausen and Perregaard (1999), Crespo Abril and Maroto Alvarez (2005), Aitzai and Boudhar (2013), and Chakroun et al. (2013). Just recently, a new parallel b&p-based heuristic for UPMSP has been suggested (Rauchecker and Schryen, 2015). We contribute to closing the aforementioned research gap by suggesting and computationally validating an exact parallel b&p algorithm for UPMSP as outlined in Section 1.2.
In Section 2, we present the mathematical formulation of our scheduling problem. In Section 3, we propose an adaptation of the serial b&p algorithm suggested by Lopes and de Carvalho (2007) to UPMSP. Section 4 presents a parallelized version of the adapted b&p algorithm. To the best knowledge of the authors, this is the first time that an exact b&p algorithm for a scheduling problem has been parallelized. In Section 5, we demonstrate the applicability of the parallelized algorithm on a Linux-based HPC cluster with extensive numerical experiments and measure its performance using established scalability metrics. With our experiments, we show that our parallelization approach achieves high efficiencies with even superlinear speedups for some instances. We further show that the wall time of our tested instances is reduced from up to 94 h for the most difficult test instance in serial execution to less than 6 min when the algorithm is executed on 960 processes. We discuss those finding in Section 6 before we finally conclude in Section 7.
Section snippets
Problem formulation
We presented an overview of articles on problems which are at least as general as UPSMP in Section 1.1. Regarding exact algorithms, the computational results from Lopes and de Carvalho (2007) and Lopes et al. (2014) for the scheduling problem R/STSD/∑wjTj are most promising in terms of efficiency and, in addition, they solve their model by a highly parallelizable b&p algorithm. Consequently, we adapt their binary linear formulation to UPMSP. We introduce the notation for our formulation in the
Serial branch-and-price algorithm for UPMSP
In this section, we present and discuss an adaptation of a b&p algorithm—introduced by Lopes and de Carvalho (2007) for the problem R/STSD/∑wjTj—to UPMSP. A b&p algorithm is a specific form of a b&b algorithm where all linear relaxations are solved by a column generation procedure (Dantzig and Wolfe, 1960). A first overview on solving integer programs (IPs) with b&p algorithms was provided by Barnhart et al. (1998). A high-level pseudo code of a b&p algorithm that solves general IPs is
Parallelization of the branch-and-price algorithm
In this section, we explain how we parallelize our b&p algorithm from Section 3. According to Gendron and Crainic (1994), there are three types of parallelism for b&b algorithms: first, the solution of single b&b nodes can be executed in parallel (type 1). Second, the b&b tree itself can be parallelized by solving concurrently active nodes of the b&b tree simultaneously (type 2). Third, multiple b&b trees can be built and explored in parallel (type 3). It is also possible to combine those types.
Computational experiments
In this section, we report the results of an extensive computational study to test the efficiency of the parallelized b&p algorithm from Section 4 when it is executed using multiple processes in an HPC environment.
Scalability in experiments of the first type
From the speedup curves in Figs. 2 and 3, we can see that the average speedup and the maximum speedup can differ substantially within instance sizes. This is driven by the fact that serial wall times differ substantially within our instance sizes as the coefficients of variation (CV) for the wall times lie between 0.9 and 1.8, see Table 3. The size of the b&b tree, which is another indicator for parallelization potential, varies to the same extent with CVs between 0.8 and 1.9, see Table D.8. It
Conclusions
In this paper, we adapt a b&p algorithm, which was originally developed by Lopes and de Carvalho (2007), to the strongly NP-hard scheduling problem R/sijk, Mj/∑wjCj. We suggest, implement, and computationally validate a master/worker parallelization strategy for the adapted algorithm, thereby bridging the gap between the largely unconnected fields of scheduling problems and HPC. We use multiple processes to solve concurrently active nodes of the b&b tree simultaneously. Our computational
Acknowledgments
This research has been funded by the Federal Ministry of Education and Research of Germany in the framework of KUBAS (project number 13N13942). Simulations were performed with computing resources granted by RWTH Aachen University under project prep0011.
References (56)
The third comprehensive survey on scheduling problems with setup times/costs
Eur. J. Oper. Res.
(2015)- et al.
A review of scheduling research involving setup considerations
Omega (Westport)
(1999) - et al.
A survey of scheduling problems with setup times or costs
Eur. J. Oper. Res.
(2008) - et al.
Sequence-dependent group scheduling problem on unrelated-parallel machines
Expert Syst. Appl.
(2012) - et al.
Combining multi-core and GPU computing for solving combinatorial optimization problems
J. Parallel Distrib. Comput.
(2013) - et al.
A state-of-the-art review of parallel-machine scheduling research
Eur. J. Oper. Res.
(1990) - et al.
Solving surgical cases assignment problem by a branch-and-price approach
Int. J. Prod. Econ.
(2008) - et al.
Optimization and approximation in deterministic sequencing and scheduling: a survey
Ann. Discrete Math.
(1979) - et al.
Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective
Robot. Comput. Integr. Manuf.
(2003) - et al.
Scheduling unrelated parallel machines with sequence-dependent setups
Comput. Oper. Res.
(2007)