An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform
Highlights
► Parallel Multistart Tabu Search for QAP, implemented on GPU (CUDA) is proposed. ► Analysis of parallelisation possibilities and memory access patterns is presented. ► PMTS runs up to faster than a single-core CPU or faster than a 6-core CPU. ► PMTS finds good quality (often optimal or the best known) solutions in a short time. ► The quality of solutions improves even further, when PMTS is run for a longer time.
Introduction
The Quadratic Assignment Problem (QAP) was first defined by Koopmans and Beckmann [19]. In QAP facilities have to be assigned to locations in order to minimise the cost. Distances between locations are stored in matrix and flows between facilities in matrix . Total cost of any assignment (permutation) can be calculated as: The objective is to find assignment , which minimises the cost function.
The most popular application for QAP is facility layout, e.g. planning university campus [9] or hospital facilities [12], [20], or more recently, DNA microarray layout [8]. In addition, QAP was used to solve Steinberg wiring problem [25], [2], to design a typewriter keyboard [24], and even dartboard design [11]. A recent survey on QAP history and applications is provided by Loiola et al. [21].
Unfortunately, QAP belongs to the class of -hard problems [14], therefore it is feasible to compute the exact solution only for relatively small instances, with values up to 30 [21]. This limitation raised interest in heuristics for finding near-optimum solutions for QAP. They include algorithms based on Tabu Search (e.g. [28]), Simulated Annealing (e.g. [5]), Scatter Search (e.g. [6]), Genetic Algorithms (e.g. [10]), Memetic Algorithms (e.g. [23]), Ant Colony Optimisation (e.g. [13], [27]) or Iterated Local Search (e.g. [26]). Even with such sophisticated techniques, the time required to obtain good quality solutions varies from several minutes for medium size datasets, to a few hours for the larger ones (e.g. [17]).
The introduction of CUDA and very powerful GPU hardware designed for High Performance Computing, have created an opportunity to considerably speed up computation. Several papers have been published on techniques to speed up heuristics for discrete optimisation problems. Janiak et al. [18] proposed Tabu Search for Travelling Salesman and Permutation Flowshop Problems, but using pre-CUDA API (Microsoft XNA). Luong et al. [22] proposed a GPU implementation of Tabu Search for QAP. The shortcoming of both implementations is that only evaluation is done on the GPU. The remaining operations are performed on the CPU, effectively harming performance by introducing CPU–GPU data transfer after each iteration. CUDA Tabu Search for Permutation Flowshop Problem presented in [7] alleviates that problem by introducing GPU implementation of the tabu list. Finally, CUDA implementation of Tabu Search for QAP (SIMD-TS), described by Zhu et al. [29], employs non-cooperative parallelism by running multiple independent Tabu Search instances. Since classic Tabu Search is deterministic, certain randomness is introduced to diversify the search paths.
This paper proposes Parallel Multistart Tabu Search for QAP, which aims to connect the effectiveness of the Multistart Tabu Search approach, and high performance provided by GPU hardware. It employs the GPU implementation of the tabu list from Czapiński and Barnes [7]. Unlike SIMD-TS presented in [29], the algorithm proposed in this paper is deterministic and it benefits from communication between parallel Tabu Search instances. Section 2 presents development of Parallel Multistart Tabu Search method. GPU implementation of PMTS is described in Section 3, then in Section 4 results of computational experiments are presented and discussed. Finally, conclusions are presented in Section 5.
Section snippets
Parallel Multistart Tabu Search for QAP
In this section, classic Tabu Search metaheuristic is presented, along with the motivation behind porting it on the GPU (Section 2.1). Then, in Section 2.2, Multistart Tabu Search and Parallel Multistart Tabu Search schemes are described. Finally, in Section 2.3, an algorithm based on PMTS scheme for solving Quadratic Assignment Problem is presented.
CUDA implementation of Parallel Multistart Tabu Search for QAP
In this section, details of GPU implementation of Parallel Multistart Tabu Search are presented. Four approaches to neighbourhood evaluation are described in Section 3.1, followed by GPU implementation details for other steps of classic Tabu Search (Section 3.2), and implementation of PMTS scheme on the GPU (Section 3.3).
Computational experiments
Computational experiments were conducted on a six-core Intel Core i7-980x 3.33 GHz CPU with 12 GB of memory, running under Ubuntu 10.04 (64 bit). GPU hardware used was NVidia GTX 480 (1.5 GB dedicated memory) equipped with 15 multiprocessors, 32 scalar processor cores each. All algorithms were implemented in C, using CUDA Toolkit 3.2. Datasets used in performance experiments were randomly generated following the method proposed by Taillard [28]. The benchmark suite consists of three
Conclusions
This paper proposes Parallel Multistart Tabu Search (PMTS) for Quadratic Assignment Problem (QAP), implemented on a GPU, using the CUDA platform. PMTS is based on multiple Tabu Search instances, running in parallel entirely on the GPU. PMTS runs up to (symmetric instances), and up to (non-symmetric instances) faster than six-core MPI implementation on a modern, high-end CPU. While an MPI implementation shows an almost linear algorithmic speed-up, the GPU implementation of PMTS runs up
Acknowledgments
The author would like to thank Dr. Stuart Barnes and Robert Sawko at Cranfield University for their invaluable comments and for proof-reading the paper.
Michał Czapiński is currently a Ph.D. Student at Cranfield University. He obtained his M.Sc. in computer science at the University of Wrocław, Institute of Computer Science in 2008, his M.Sc. in Computational and Software Techniques in Engineering at Cranfield University, School of Engineering in 2009, and his B.Sc. in mathematics at the University of Wrocław, Mathematical Institute in 2010. His research interests include distributed and parallel computing, discrete optimisation, and software
References (29)
- et al.
QAPLIB—a Quadratic Assignment Problem library
European Journal of Operational Research
(1991) - et al.
A thermodynamically motivated simulation procedure for combinatorial optimization problems
European Journal of Operational Research
(1984) An improved annealing scheme for the QAP
European Journal of Operational Research
(1990)- et al.
Tabu search with two approaches to parallel flowshop evaluation on CUDA platform
Journal of Parallel and Distributed Computing
(2011) - et al.
Campus building arrangement using TOPAZ
Transportation Science
(1972) Future paths for integer programming and links to artificial intelligence
Computers & Operations Research
(1986)- et al.
A survey for the Quadratic Assignment Problem
European Journal of Operational Research
(2007) Iterated local search for the Quadratic Assignment Problem
European Journal of Operational Research
(2006)Robust taboo search for the Quadratic Assignment Problem
Parallel Computing
(1991)- et al.
The reactive tabu search
INFORMS Journal on Computing
(1994)
A new genetic algorithm for the Quadratic Assignment Problem
INFORMS Journal on Computing
Cited by (49)
Quadratic assignment problem variants: A survey and an effective parallel memetic iterated tabu search
2021, European Journal of Operational ResearchCitation Excerpt :A random diversification operator is called after some TS iterations to avoid stagnation. An improved version of this algorithm is presented by Czapiński (2013). The Parallel Multistart Tabu Search (PMTS) implements the CoTS with two levels of parallelism, first for the neighborhood evaluation, as in Zhu et al. (2010), and second for the tabu list management and move selection, achieving a much higher speedup.
Parallel computational optimization in operations research: A new integrative framework, literature review and research directions
2020, European Journal of Operational ResearchA hybrid method integrating an elite genetic algorithm with tabu search for the quadratic assignment problem
2020, Information SciencesCitation Excerpt :By introducing two new reaction strategies for tabu search, Fescioglu-Unver and Kokar [18] proposed a self-controlling tabu search algorithm for the QAP, in which the first strategy treats the tabu search algorithm as a target system to be controlled and uses a control-theoretic approach to adjust the parameters that affect search intensification, and the second strategy is a flexible diversification strategy that can adjust the parameters based on the search history. Because tabu search algorithms allow a natural parallel implementation by dividing the burden of the search in the neighborhood among several processors, several parallel implementations have been proposed [13,22]. Tabu search has also been integrated with other meta-heuristics for solving the QAP, such as the whale optimization algorithm [2], and biogeography-based optimization algorithm [28], with the important exception of genetic algorithms.
A cooperative GPU-based Parallel Multistart Simulated Annealing algorithm for Quadratic Assignment Problem
2018, Engineering Science and Technology, an International JournalCitation Excerpt :PMSA obtained the BKS in 7 of 12 instances. The runtimes for PMTS [20] ranged from 0.15 to 16.9 min. The runtimes for PMSA ranged from 5.1 to 11.3 min.
A stagnation-aware cooperative parallel breakout local search algorithm for the quadratic assignment problem
2017, Computers and Industrial EngineeringDevelopment, application, and comparison of hybrid meta-heuristics for urban land-use allocation optimization: Tabu search, genetic, GRASP, and simulated annealing algorithms
2016, Computers, Environment and Urban Systems
Michał Czapiński is currently a Ph.D. Student at Cranfield University. He obtained his M.Sc. in computer science at the University of Wrocław, Institute of Computer Science in 2008, his M.Sc. in Computational and Software Techniques in Engineering at Cranfield University, School of Engineering in 2009, and his B.Sc. in mathematics at the University of Wrocław, Mathematical Institute in 2010. His research interests include distributed and parallel computing, discrete optimisation, and software engineering.