An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform

https://doi.org/10.1016/j.jpdc.2012.07.014Get rights and content

Abstract

NVidia’s powerful GPU hardware and CUDA platform enables the design of very fast parallel algorithms. Relatively little research has been done so far on GPU implementations of algorithms for computationally demanding discrete optimisation problems. In this paper, the well-known NP-hard Quadratic Assignment Problem (QAP) is considered. Detailed analysis of parallelisation possibilities, memory organisation and access patterns, enables the implementation of fast and effective heuristics for QAP on the GPU — the Parallel Multistart Tabu Search (PMTS). Computational experiments show that PMTS is capable of providing good quality (often optimal or the best known) solutions in a short time, and even better quality solutions in longer runs. PMTS runs up to 420× faster than a single-core counterpart, or 70× faster than a parallel CPU implementation on a high-end six-core CPU.

Highlights

► Parallel Multistart Tabu Search for QAP, implemented on GPU (CUDA) is proposed. ► Analysis of parallelisation possibilities and memory access patterns is presented. ► PMTS runs up to 420× faster than a single-core CPU or 70× faster than a 6-core CPU. ► PMTS finds good quality (often optimal or the best known) solutions in a short time. ► The quality of solutions improves even further, when PMTS is run for a longer time.

Introduction

The Quadratic Assignment Problem (QAP) was first defined by Koopmans and Beckmann  [19]. In QAP n facilities have to be assigned to n locations in order to minimise the cost. Distances between locations are stored in n×n matrix d  and flows between facilities in n×n matrix f. Total cost of any assignment (permutation) π can be calculated as: C(π)=i=1nj=1ndi,jfπ(i),π(j). The objective is to find assignment π, which minimises the cost function.

The most popular application for QAP is facility layout, e.g. planning university campus  [9] or hospital facilities  [12], [20], or more recently, DNA microarray layout  [8]. In addition, QAP was used to solve Steinberg wiring problem  [25], [2], to design a typewriter keyboard  [24], and even dartboard design  [11]. A recent survey on QAP history and applications is provided by Loiola et al.  [21].

Unfortunately, QAP belongs to the class of NP-hard problems  [14], therefore it is feasible to compute the exact solution only for relatively small instances, with n values up to 30  [21]. This limitation raised interest in heuristics for finding near-optimum solutions for QAP. They include algorithms based on Tabu Search (e.g.  [28]), Simulated Annealing (e.g.  [5]), Scatter Search (e.g.  [6]), Genetic Algorithms (e.g.  [10]), Memetic Algorithms (e.g.  [23]), Ant Colony Optimisation (e.g.  [13], [27]) or Iterated Local Search (e.g.  [26]). Even with such sophisticated techniques, the time required to obtain good quality solutions varies from several minutes for medium size datasets, to a few hours for the larger ones (e.g.  [17]).

The introduction of CUDA and very powerful GPU hardware designed for High Performance Computing, have created an opportunity to considerably speed up computation. Several papers have been published on techniques to speed up heuristics for discrete optimisation problems. Janiak et al.  [18] proposed Tabu Search for Travelling Salesman and Permutation Flowshop Problems, but using pre-CUDA API (Microsoft XNA). Luong et al.  [22] proposed a GPU implementation of Tabu Search for QAP. The shortcoming of both implementations is that only evaluation is done on the GPU. The remaining operations are performed on the CPU, effectively harming performance by introducing CPU–GPU data transfer after each iteration. CUDA Tabu Search for Permutation Flowshop Problem presented in  [7] alleviates that problem by introducing GPU implementation of the tabu list. Finally, CUDA implementation of Tabu Search for QAP (SIMD-TS), described by Zhu et al.  [29], employs non-cooperative parallelism by running multiple independent Tabu Search instances. Since classic Tabu Search is deterministic, certain randomness is introduced to diversify the search paths.

This paper proposes Parallel Multistart Tabu Search for QAP, which aims to connect the effectiveness of the Multistart Tabu Search approach, and high performance provided by GPU hardware. It employs the GPU implementation of the tabu list from Czapiński and Barnes  [7]. Unlike SIMD-TS presented in  [29], the algorithm proposed in this paper is deterministic and it benefits from communication between parallel Tabu Search instances. Section  2 presents development of Parallel Multistart Tabu Search method. GPU implementation of PMTS is described in Section  3, then in Section  4 results of computational experiments are presented and discussed. Finally, conclusions are presented in Section  5.

Section snippets

Parallel Multistart Tabu Search for QAP

In this section, classic Tabu Search metaheuristic is presented, along with the motivation behind porting it on the GPU (Section  2.1). Then, in Section  2.2, Multistart Tabu Search and Parallel Multistart Tabu Search schemes are described. Finally, in Section  2.3, an algorithm based on PMTS scheme for solving Quadratic Assignment Problem is presented.

CUDA implementation of Parallel Multistart Tabu Search for QAP

In this section, details of GPU implementation of Parallel Multistart Tabu Search are presented. Four approaches to neighbourhood evaluation are described in Section  3.1, followed by GPU implementation details for other steps of classic Tabu Search (Section  3.2), and implementation of PMTS scheme on the GPU (Section  3.3).

Computational experiments

Computational experiments were conducted on a six-core Intel Core i7-980x 3.33 GHz CPU with 12 GB of memory, running under Ubuntu 10.04 (64 bit). GPU hardware used was NVidia GTX 480 (1.5 GB dedicated memory) equipped with 15 multiprocessors, 32 scalar processor cores each. All algorithms were implemented in C++, using CUDA Toolkit 3.2. Datasets used in performance experiments were randomly generated following the method proposed by Taillard  [28]. The benchmark suite consists of three

Conclusions

This paper proposes Parallel Multistart Tabu Search (PMTS) for Quadratic Assignment Problem (QAP), implemented on a GPU, using the CUDA platform. PMTS is based on multiple Tabu Search instances, running in parallel entirely on the GPU. PMTS runs up to 50× (symmetric instances), and up to 70× (non-symmetric instances) faster than six-core MPI implementation on a modern, high-end CPU. While an MPI implementation shows an almost linear algorithmic speed-up, the GPU implementation of PMTS runs up

Acknowledgments

The author would like to thank Dr. Stuart Barnes and Robert Sawko at Cranfield University for their invaluable comments and for proof-reading the paper.

Michał Czapiński is currently a Ph.D. Student at Cranfield University. He obtained his M.Sc. in computer science at the University of Wrocław, Institute of Computer Science in 2008, his M.Sc. in Computational and Software Techniques in Engineering at Cranfield University, School of Engineering in 2009, and his B.Sc. in mathematics at the University of Wrocław, Mathematical Institute in 2010. His research interests include distributed and parallel computing, discrete optimisation, and software

References (29)

  • N.W. Brixius, K.M. Anstreicher, The Steinberg wiring problem, Tech. Rep., The University of Iowa,...
  • V.-D. Cung, T. Mautor, P. Michelon, A. Tavares, A scatter search based approach for the Quadratic Assignment Problem,...
  • S. de Cravalho Jr., S. Rahman, Microarray layout as Quadratic Assignment Problem, in: Proceedings of the German...
  • Z. Drezner

    A new genetic algorithm for the Quadratic Assignment Problem

    INFORMS Journal on Computing

    (2003)
  • Cited by (49)

    • Quadratic assignment problem variants: A survey and an effective parallel memetic iterated tabu search

      2021, European Journal of Operational Research
      Citation Excerpt :

      A random diversification operator is called after some TS iterations to avoid stagnation. An improved version of this algorithm is presented by Czapiński (2013). The Parallel Multistart Tabu Search (PMTS) implements the CoTS with two levels of parallelism, first for the neighborhood evaluation, as in Zhu et al. (2010), and second for the tabu list management and move selection, achieving a much higher speedup.

    • A hybrid method integrating an elite genetic algorithm with tabu search for the quadratic assignment problem

      2020, Information Sciences
      Citation Excerpt :

      By introducing two new reaction strategies for tabu search, Fescioglu-Unver and Kokar [18] proposed a self-controlling tabu search algorithm for the QAP, in which the first strategy treats the tabu search algorithm as a target system to be controlled and uses a control-theoretic approach to adjust the parameters that affect search intensification, and the second strategy is a flexible diversification strategy that can adjust the parameters based on the search history. Because tabu search algorithms allow a natural parallel implementation by dividing the burden of the search in the neighborhood among several processors, several parallel implementations have been proposed [13,22]. Tabu search has also been integrated with other meta-heuristics for solving the QAP, such as the whale optimization algorithm [2], and biogeography-based optimization algorithm [28], with the important exception of genetic algorithms.

    • A cooperative GPU-based Parallel Multistart Simulated Annealing algorithm for Quadratic Assignment Problem

      2018, Engineering Science and Technology, an International Journal
      Citation Excerpt :

      PMSA obtained the BKS in 7 of 12 instances. The runtimes for PMTS [20] ranged from 0.15 to 16.9 min. The runtimes for PMSA ranged from 5.1 to 11.3 min.

    View all citing articles on Scopus

    Michał Czapiński is currently a Ph.D. Student at Cranfield University. He obtained his M.Sc. in computer science at the University of Wrocław, Institute of Computer Science in 2008, his M.Sc. in Computational and Software Techniques in Engineering at Cranfield University, School of Engineering in 2009, and his B.Sc. in mathematics at the University of Wrocław, Mathematical Institute in 2010. His research interests include distributed and parallel computing, discrete optimisation, and software engineering.

    View full text