An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform

doi:10.1016/j.jpdc.2012.07.014

Journal of Parallel and Distributed Computing

Volume 73, Issue 11, November 2013, Pages 1461-1468

https://doi.org/10.1016/j.jpdc.2012.07.014 Get rights and content

Abstract

NVidia’s powerful GPU hardware and CUDA platform enables the design of very fast parallel algorithms. Relatively little research has been done so far on GPU implementations of algorithms for computationally demanding discrete optimisation problems. In this paper, the well-known $NP$ -hard Quadratic Assignment Problem (QAP) is considered. Detailed analysis of parallelisation possibilities, memory organisation and access patterns, enables the implementation of fast and effective heuristics for QAP on the GPU — the Parallel Multistart Tabu Search (PMTS). Computational experiments show that PMTS is capable of providing good quality (often optimal or the best known) solutions in a short time, and even better quality solutions in longer runs. PMTS runs up to $420 \times$ faster than a single-core counterpart, or $70 \times$ faster than a parallel CPU implementation on a high-end six-core CPU.

Highlights

► Parallel Multistart Tabu Search for QAP, implemented on GPU (CUDA) is proposed. ► Analysis of parallelisation possibilities and memory access patterns is presented. ► PMTS runs up to $420 \times$ faster than a single-core CPU or $70 \times$ faster than a 6-core CPU. ► PMTS finds good quality (often optimal or the best known) solutions in a short time. ► The quality of solutions improves even further, when PMTS is run for a longer time.

Introduction

The Quadratic Assignment Problem (QAP) was first defined by Koopmans and Beckmann [19]. In QAP $n$ facilities have to be assigned to $n$ locations in order to minimise the cost. Distances between locations are stored in $n \times n$ matrix $d$ and flows between facilities in $n \times n$ matrix $f$ . Total cost of any assignment (permutation) $π$ can be calculated as: $C (π) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} d_{i, j} \cdot f_{π (i), π (j)} .$ The objective is to find assignment $π^{*}$ , which minimises the cost function.

The most popular application for QAP is facility layout, e.g. planning university campus [9] or hospital facilities [12], [20], or more recently, DNA microarray layout [8]. In addition, QAP was used to solve Steinberg wiring problem [25], [2], to design a typewriter keyboard [24], and even dartboard design [11]. A recent survey on QAP history and applications is provided by Loiola et al. [21].

Unfortunately, QAP belongs to the class of $NP$ -hard problems [14], therefore it is feasible to compute the exact solution only for relatively small instances, with $n$ values up to 30 [21]. This limitation raised interest in heuristics for finding near-optimum solutions for QAP. They include algorithms based on Tabu Search (e.g. [28]), Simulated Annealing (e.g. [5]), Scatter Search (e.g. [6]), Genetic Algorithms (e.g. [10]), Memetic Algorithms (e.g. [23]), Ant Colony Optimisation (e.g. [13], [27]) or Iterated Local Search (e.g. [26]). Even with such sophisticated techniques, the time required to obtain good quality solutions varies from several minutes for medium size datasets, to a few hours for the larger ones (e.g. [17]).

The introduction of CUDA and very powerful GPU hardware designed for High Performance Computing, have created an opportunity to considerably speed up computation. Several papers have been published on techniques to speed up heuristics for discrete optimisation problems. Janiak et al. [18] proposed Tabu Search for Travelling Salesman and Permutation Flowshop Problems, but using pre-CUDA API (Microsoft XNA). Luong et al. [22] proposed a GPU implementation of Tabu Search for QAP. The shortcoming of both implementations is that only evaluation is done on the GPU. The remaining operations are performed on the CPU, effectively harming performance by introducing CPU–GPU data transfer after each iteration. CUDA Tabu Search for Permutation Flowshop Problem presented in [7] alleviates that problem by introducing GPU implementation of the tabu list. Finally, CUDA implementation of Tabu Search for QAP (SIMD-TS), described by Zhu et al. [29], employs non-cooperative parallelism by running multiple independent Tabu Search instances. Since classic Tabu Search is deterministic, certain randomness is introduced to diversify the search paths.

This paper proposes Parallel Multistart Tabu Search for QAP, which aims to connect the effectiveness of the Multistart Tabu Search approach, and high performance provided by GPU hardware. It employs the GPU implementation of the tabu list from Czapiński and Barnes [7]. Unlike SIMD-TS presented in [29], the algorithm proposed in this paper is deterministic and it benefits from communication between parallel Tabu Search instances. Section 2 presents development of Parallel Multistart Tabu Search method. GPU implementation of PMTS is described in Section 3, then in Section 4 results of computational experiments are presented and discussed. Finally, conclusions are presented in Section 5.

Section snippets

Parallel Multistart Tabu Search for QAP

In this section, classic Tabu Search metaheuristic is presented, along with the motivation behind porting it on the GPU (Section 2.1). Then, in Section 2.2, Multistart Tabu Search and Parallel Multistart Tabu Search schemes are described. Finally, in Section 2.3, an algorithm based on PMTS scheme for solving Quadratic Assignment Problem is presented.

CUDA implementation of Parallel Multistart Tabu Search for QAP

In this section, details of GPU implementation of Parallel Multistart Tabu Search are presented. Four approaches to neighbourhood evaluation are described in Section 3.1, followed by GPU implementation details for other steps of classic Tabu Search (Section 3.2), and implementation of PMTS scheme on the GPU (Section 3.3).

Computational experiments

Computational experiments were conducted on a six-core Intel Core i7-980x 3.33 GHz CPU with 12 GB of memory, running under Ubuntu 10.04 (64 bit). GPU hardware used was NVidia GTX 480 (1.5 GB dedicated memory) equipped with 15 multiprocessors, 32 scalar processor cores each. All algorithms were implemented in C $+ +$ , using CUDA Toolkit 3.2. Datasets used in performance experiments were randomly generated following the method proposed by Taillard [28]. The benchmark suite consists of three

Conclusions

This paper proposes Parallel Multistart Tabu Search (PMTS) for Quadratic Assignment Problem (QAP), implemented on a GPU, using the CUDA platform. PMTS is based on multiple Tabu Search instances, running in parallel entirely on the GPU. PMTS runs up to $50 \times$ (symmetric instances), and up to $70 \times$ (non-symmetric instances) faster than six-core MPI implementation on a modern, high-end CPU. While an MPI implementation shows an almost linear algorithmic speed-up, the GPU implementation of PMTS runs up

Acknowledgments

The author would like to thank Dr. Stuart Barnes and Robert Sawko at Cranfield University for their invaluable comments and for proof-reading the paper.

Michał Czapiński is currently a Ph.D. Student at Cranfield University. He obtained his M.Sc. in computer science at the University of Wrocław, Institute of Computer Science in 2008, his M.Sc. in Computational and Software Techniques in Engineering at Cranfield University, School of Engineering in 2009, and his B.Sc. in mathematics at the University of Wrocław, Mathematical Institute in 2010. His research interests include distributed and parallel computing, discrete optimisation, and software

References (29)

R.E. Burkard et al.
QAPLIB—a Quadratic Assignment Problem library
European Journal of Operational Research
(1991)
R.E. Burkard et al.
A thermodynamically motivated simulation procedure for combinatorial optimization problems
European Journal of Operational Research
(1984)
D.T. Connolly
An improved annealing scheme for the QAP
European Journal of Operational Research
(1990)
M. Czapiński et al.
Tabu search with two approaches to parallel flowshop evaluation on CUDA platform
Journal of Parallel and Distributed Computing
(2011)
J. Dickey et al.
Campus building arrangement using TOPAZ
Transportation Science
(1972)
F. Glover
Future paths for integer programming and links to artificial intelligence
Computers & Operations Research
(1986)
E.M. Loiola et al.
A survey for the Quadratic Assignment Problem
European Journal of Operational Research
(2007)
T. Stützle
Iterated local search for the Quadratic Assignment Problem
European Journal of Operational Research
(2006)
E. Taillard
Robust taboo search for the Quadratic Assignment Problem
Parallel Computing
(1991)
R. Battiti et al.
The reactive tabu search
INFORMS Journal on Computing
(1994)

N.W. Brixius, K.M. Anstreicher, The Steinberg wiring problem, Tech. Rep., The University of Iowa,...

V.-D. Cung, T. Mautor, P. Michelon, A. Tavares, A scatter search based approach for the Quadratic Assignment Problem,...

S. de Cravalho Jr., S. Rahman, Microarray layout as Quadratic Assignment Problem, in: Proceedings of the German...

Z. Drezner

A new genetic algorithm for the Quadratic Assignment Problem

INFORMS Journal on Computing

(2003)

Cited by (49)

Quadratic assignment problem variants: A survey and an effective parallel memetic iterated tabu search
2021, European Journal of Operational Research
Citation Excerpt :
A random diversification operator is called after some TS iterations to avoid stagnation. An improved version of this algorithm is presented by Czapiński (2013). The Parallel Multistart Tabu Search (PMTS) implements the CoTS with two levels of parallelism, first for the neighborhood evaluation, as in Zhu et al. (2010), and second for the tabu list management and move selection, achieving a much higher speedup.
In the Quadratic Assignment Problem (QAP), facilities are assigned to sites in order to minimize interactions between pairs of facilities. Although easy to define, it is among the hardest problems in combinatorial optimization, due to its non-linear nature. After decades of research on the QAP, many variants of this problem arose to deal with different applications. Along with the QAP, we consider four variants – the Quadratic Bottleneck Assignment Problem, the Biquadratic Assignment Problem, the Quadratic Semi-Assignment Problem, and the Generalized QAP – and develop a single framework to solve them all. Our parallel memetic iterated tabu search (PMITS) extends the most successful heuristics to solve the QAP. It combines the diversification phase of generating new local optima found after solutions modified by a new crossover operator that is biased towards one of the parents, with the intensification phase of an effective tabu search which uses a simplified tabu list structure to reduce the number of parameters and a new long-term memory that saves solutions previously visited to speed up the search. Solutions are improved concurrently using parallelism, and a convergence criterion determines whether the search stops according to the best solutions in each parallel search. Computational experiments using the hardest benchmark instances from the literature attest the effectiveness of the PMITS, showing its competitiveness when compared to the state-of-the-art methods, sequential and parallel, to solve the QAP. We also show that PMITS significantly outperforms the best methods found for all four variants of the QAP, significantly updating their literature.
Parallel computational optimization in operations research: A new integrative framework, literature review and research directions
2020, European Journal of Operational Research
Solving optimization problems with parallel algorithms has a long tradition in OR. Its future relevance for solving hard optimization problems in many fields, including finance, logistics, production and design, is leveraged through the increasing availability of powerful computing capabilities. Acknowledging the existence of several literature reviews on parallel optimization, we did not find reviews that cover the most recent literature on the parallelization of both exact and (meta)heuristic methods. However, in the past decade substantial advancements in parallel computing capabilities have been achieved and used by OR scholars so that an overview of modern parallel optimization in OR that accounts for these advancements is beneficial. Another issue from previous reviews results from their adoption of different foci so that concepts used to describe and structure prior literature differ. This heterogeneity is accompanied by a lack of unifying frameworks for parallel optimization across methodologies, application fields and problems, and it has finally led to an overall fragmented picture of what has been achieved and still needs to be done in parallel optimization in OR. This review addresses the aforementioned issues with three contributions: First, we suggest a new integrative framework of parallel computational optimization across optimization problems, algorithms and application domains. The framework integrates the perspectives of algorithmic design and computational implementation of parallel optimization. Second, we apply the framework to synthesize prior research on parallel optimization in OR, focusing on computational studies published in the period 2008–2017. Finally, we suggest research directions for parallel optimization in OR.
A hybrid method integrating an elite genetic algorithm with tabu search for the quadratic assignment problem
2020, Information Sciences
Citation Excerpt :
By introducing two new reaction strategies for tabu search, Fescioglu-Unver and Kokar [18] proposed a self-controlling tabu search algorithm for the QAP, in which the first strategy treats the tabu search algorithm as a target system to be controlled and uses a control-theoretic approach to adjust the parameters that affect search intensification, and the second strategy is a flexible diversification strategy that can adjust the parameters based on the search history. Because tabu search algorithms allow a natural parallel implementation by dividing the burden of the search in the neighborhood among several processors, several parallel implementations have been proposed [13,22]. Tabu search has also been integrated with other meta-heuristics for solving the QAP, such as the whale optimization algorithm [2], and biogeography-based optimization algorithm [28], with the important exception of genetic algorithms.
The Quadratic Assignment Problem (QAP) is one of the most studied classical combinatorial optimization problems. QAP has many practical applications. Designing enhanced meta-heuristic approaches for the QAP is an active research area. In this work, we propose a hybrid algorithm (EGATS) that combines an elite genetic algorithm and tabu search to solve the QAP. In the optimization process, EGATS employs two kinds of elite crossovers, repeated 2-exchange mutation, and tabu search to strike a balance between exploitation and exploration. We evaluated the performance of EGATS through computational experiments on 135 well-known benchmark instances from the quadratic assignment problem library, QAPLIB. EGATS obtained the best-known solution for 131 instances. Compared to other popular meta-heuristic algorithms in the literature, EGATS is a competitive method for the QAP.
A cooperative GPU-based Parallel Multistart Simulated Annealing algorithm for Quadratic Assignment Problem
2018, Engineering Science and Technology, an International Journal
Citation Excerpt :
PMSA obtained the BKS in 7 of 12 instances. The runtimes for PMTS [20] ranged from 0.15 to 16.9 min. The runtimes for PMSA ranged from 5.1 to 11.3 min.
GPU hardware and CUDA architecture provide a powerful platform to develop parallel algorithms. Implementation of heuristic and metaheuristic algorithms on GPUs are limited in literature. Nowadays developing parallel algorithms on GPU becomes very important. In this paper, NP-Hard Quadratic Assignment Problem (QAP) that is one of the combinatorial optimization problems is discussed. Parallel Multistart Simulated Annealing (PMSA) method is developed with CUDA architecture to solve QAP. An efficient method is developed by providing multistart technique and cooperation between threads. The cooperation is occurred with threads in both the same and different blocks. This paper focuses on both acceleration and quality of solutions. Computational experiments conducted on many Quadratic Assignment Problem Library (QAPLIB) instances. The experimental results show that PMSA runs up to 29x faster than a single-core CPU and acquires best known solution in a short time in many benchmark datasets.
A stagnation-aware cooperative parallel breakout local search algorithm for the quadratic assignment problem
2017, Computers and Industrial Engineering
The Quadratic Assignment Problem (QAP) is one of the most challenging NP-Hard combinatorial optimization problems. Circuit-layout design, transportation/traffic engineering, and assigning gates to airplanes are some of the interesting applications of the QAP. In this study, we introduce an enhanced version of a recent local search heuristic, Breakout Local Search Algorithm (BLS), by using the Levenshtein Distance metric for checking the similarity of the new starting points to previously explored QAP permutations. The similarity-checking process prevents the local search algorithm, BLS, from getting stuck in already-explored areas. In addition, the proposed BLS Algorithm (BLS-OpenMP) incorporates multi-threaded computation using OpenMP. The stagnation-aware search for the optimal solutions of the QAP is executed concurrently on several cores with diversified trajectories while considering their similarity to already-discovered local optima. The exploration of the search space is improved by selecting the starting points intelligently and speeding up the fitness evaluations linearly with number of processors/threads. BLS-OpenMP has been tested on the hardest 59 problem instances of the QAPLIB, and it obtained 57 of the best known results. The overall deviation of the achieved solutions from the best known results is 0.019% on average, which is a significant improvement compared with state-of-the-art algorithms.
Development, application, and comparison of hybrid meta-heuristics for urban land-use allocation optimization: Tabu search, genetic, GRASP, and simulated annealing algorithms
2016, Computers, Environment and Urban Systems
Land-use optimization problem (LUOP) that seeks to allocate different land types to land units involves various dimensions and deals with numerous conflicting objectives and a large set of data and variables. Single meta-heuristics are broadly developed and applied for solving LUOP. Despite the acceptable solutions derived from these algorithms, researchers in both academic and practical areas face the interesting question: can we develop an algorithm with better efficiency and solution quality? In the literature of operation research, hybridization, a combination of meta-heuristics, was introduced as a way of generating better algorithms. Therefore, this paper aims at developing novel algorithms through hybridizing Tabu search (TS), genetic algorithm (GA), GRASP, and simulated annealing (SA) and examining their quality and efficiency in practice. Accordingly, low-level teamwork GRASP–GA–TS (LLTGRGATS), high-level relay Greedy–GA–TS, and high-level teamwork SA were developed. Firstly, these algorithms were applied for solving small- and large-size single-row facility layout problem to evaluate their performance and functionality and to select the satisfactory algorithm in comparison with the other developed hybrids. Secondly, the selected algorithm, LLTGRGATS, and SVNS, a recent hybrid algorithm proposed for solving LUOP, were performed on a study area to solve a LUOP with two constraints and seven nonlinear objective functions. The outputs showed that the quality and efficiency of LLTGRGATS were slightly better than those of SVNS and it can be considered as a favorable tool for land-use planning process.

View all citing articles on Scopus

View full text

An effective Parallel Multistart Tabu Search for Quadratic Assignment Problem on CUDA platform

Abstract

Highlights

Introduction

Section snippets

Parallel Multistart Tabu Search for QAP

CUDA implementation of Parallel Multistart Tabu Search for QAP

Computational experiments

Conclusions

Acknowledgments

European Journal of Operational Research

European Journal of Operational Research

European Journal of Operational Research

Journal of Parallel and Distributed Computing

Transportation Science

Computers & Operations Research

European Journal of Operational Research

European Journal of Operational Research

Parallel Computing

The reactive tabu search

INFORMS Journal on Computing

A new genetic algorithm for the Quadratic Assignment Problem

INFORMS Journal on Computing