ABSTRACT
Particle Swarm Optimization (PSO) has been widely used in various optimization tasks (e.g., neural architecture search and autonomous vehicle navigation), because it can solve non-convex optimization problems with simplicity and efficacy. However, the PSO algorithm is often time-consuming to use, especially for high-dimensional problems, which hinders its applicability in time-critical applications. In this paper, we propose novel techniques to accelerate the PSO algorithm with GPUs. To mitigate the efficiency bottleneck, we formally model the PSO optimization as a process of element-wise operations on matrices. Based on the modeling, we develop an efficient GPU algorithm to perform the element-wise operations in massively parallel using the tensor cores and shared memory. Moreover, we propose a series of novel techniques to improve our proposed algorithm, including (i) GPU resource-aware thread creation to prevent creating too many threads when the number of particles/dimensions is large; (ii) designing parallel techniques to initialize swarm particles with fast random number generation; (iii) exploiting GPU memory caching to manage swarm information instead of allocating new memory and (iv) developing a schema to support customized swarm evaluation functions. We conduct extensive experiments on four optimization applications to study the efficiency of our algorithm called “FastPSO”. Experimental results show that FastPSO consistently outperforms the existing CPU-based PSO libraries by two orders of magnitude, and transcends the existing GPU-based implementation by 5 to 7 times, while achieving better or competitive optimization results.
- Enrique Alba, Gabriel Luque, and Sergio Nesmachnow. 2013. Parallel metaheuristics: recent advances and new trends. International Transactions in Operational Research 20, 1 (2013), 1–48.Google ScholarCross Ref
- Emilio Fortunato Campana, Matteo Diez, Giovanni Fasano, and Daniele Peri. 2013. Initial particles position for PSO, in bound constrained optimization. In International Conference in Swarm Intelligence. Springer, 112–119.Google ScholarCross Ref
- Songsak Chusanapiputt, Dulyatat Nualhong, Sujate Jantarang, and Sukumvit Phoomvuthisarn. 2005. Relative velocity updating in parallel particle swarm optimization based lagrangian relaxation for large-scale unit commitment problem. In TENCON 2005-2005 IEEE Region 10 Conference. IEEE, 1–6.Google ScholarCross Ref
- Ashraf Darwish, Aboul Ella Hassanien, and Swagatam Das. 2020. A survey of swarm and evolutionary computing approaches for deep learning. Artificial Intelligence Review 53, 3 (2020), 1767–1812.Google ScholarCross Ref
- Marco Dorigo, Mauro Birattari, and Thomas Stutzle. 2006. Ant colony optimization. IEEE Computational Intelligence Magazine 1, 4 (2006), 28–39.Google ScholarDigital Library
- Dennis Gies and Yahya Rahmat-Samii. 2003. Reconfigurable array design using parallel particle swarm optimization. In IEEE Antennas and Propagation Society International Symposium. Digest. Held in conjunction with: USNC/CNC/URSI North American Radio Sci. Meeting (Cat. No. 03CH37450), Vol. 1. IEEE, 177–180.Google ScholarCross Ref
- John L Gustafson. 1988. Reevaluating Amdahl’s law. Commun. ACM 31, 5 (1988), 532–533.Google ScholarDigital Library
- Hashim A Hashim and Mohammad A Abido. 2019. Location management in LTE networks using multi-objective particle swarm optimization. Computer Networks 157(2019), 78–88.Google ScholarDigital Library
- Roger A Horn. 1990. The hadamard product. In Proc. Symp. Appl. Math, Vol. 40. 87–169.Google ScholarCross Ref
- Md Maruf Hussain, Hiroshi Hattori, and Noriyuki Fujimoto. 2016. A CUDA implementation of the standard particle swarm optimization. In 2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 219–226.Google ScholarCross Ref
- Jing Jiang, Fei Han, Qinghua Ling, Jie Wang, Tiange Li, and Henry Han. 2020. Efficient network architecture search via multiobjective particle swarm optimization based on decomposition. Neural Networks 123(2020), 305–316.Google ScholarDigital Library
- Francisco Erivaldo Fernandes Junior and Gary G Yen. 2019. Particle swarm optimization of deep neural networks architectures for image classification. Swarm and Evolutionary Computation 49 (2019), 62–74.Google ScholarCross Ref
- Dervis Karaboga. 2010. Artificial bee colony algorithm. Scholarpedia 5, 3 (2010), 6915.Google ScholarCross Ref
- Massimiliano Kaucic. 2013. A multi-start opposition-based particle swarm optimization algorithm with adaptive velocity for bound constrained global optimization. Journal of Global Optimization 55, 1 (2013), 165–188.Google ScholarDigital Library
- James Kennedy and Russell Eberhart. 1995. Particle swarm optimization. In Proceedings of ICNN’95-International Conference on Neural Networks, Vol. 4. IEEE, 1942–1948.Google ScholarCross Ref
- Byung-Il Koh, Alan D George, Raphael T Haftka, and Benjamin J Fregly. 2006. Parallel asynchronous particle swarm optimization. Internat. J. Numer. Methods Engrg. 67, 4 (2006), 578–595.Google ScholarCross Ref
- Gerardo A Laguna-Sánchez, Mauricio Olguín-Carbajal, Nareli Cruz-Cortés, Ricardo Barrón-Fernández, and Jesús A Álvarez-Cedillo. 2009. Comparative study of parallel variants for a particle swarm optimization algorithm implemented on a multithreading GPU. Journal of Applied Research and Technology 7, 3 (2009), 292–307.Google ScholarCross Ref
- Andrew W McNabb, Christopher K Monson, and Kevin D Seppi. 2007. Parallel PSO using MapReduce. In 2007 IEEE Congress on Evolutionary Computation. IEEE, 7–14.Google ScholarCross Ref
- Lester James Miranda. 2018. PySwarms: a research toolkit for particle swarm optimization in Python. Journal of Open Source Software 3, 21 (2018), 433.Google ScholarCross Ref
- Marcin Molga and Czesław Smutnicki. 2005. Test functions for optimization needs. Test Functions for Optimization Needs 101 (2005), 48.Google Scholar
- Luca Mussi, Fabio Daolio, and Stefano Cagnoni. 2011. Evaluation of parallel particle swarm optimization algorithms within the CUDA-TM architecture. Information Sciences 181, 20 (2011), 4642–4657.Google ScholarDigital Library
- Anand Nayyar and Nhu Gia Nguyen. 2018. Introduction to swarm intelligence. Advances in Swarm Intelligence for Optimizing Problems in Computer Science (2018), 53–78.Google Scholar
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of Machine Learning Research 12 (2011), 2825–2830.Google Scholar
- Vincent Roberge and Mohammed Tarbouchi. 2012. Parallel particle swarm optimization on graphical processing unit for pose estimation. WSEAS Trans. Comput 11, 6 (2012), 170–179.Google Scholar
- Shane Ryoo, Christopher I Rodrigues, Sara S Baghsorkhi, Sam S Stone, David B Kirk, and Wen-mei W Hwu. 2008. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 73–82.Google ScholarDigital Library
- JF Schutte, BJ Fregly, RT Haftka, and AD George. 2003. A parallel particle swarm optimizer. Technical Report. FLORIDA UNIV GAINESVILLE MECHANICAL AND AEROSPACE ENGINEERING.Google Scholar
- Andrea Serani, Cecilia Leotardi, Umberto Iemma, Emilio F Campana, Giovanni Fasano, and Matteo Diez. 2016. Parameter selection in synchronous and asynchronous deterministic particle swarm optimization for ship hydrodynamics problems. Applied Soft Computing 49 (2016), 313–334.Google ScholarDigital Library
- Ying Tan and Ke Ding. 2015. A survey on GPU-based implementation of swarm intelligence algorithms. IEEE Transactions on Cybernetics 46, 9 (2015), 2028–2041.Google ScholarCross Ref
- Gerhard Venter and Jaroslaw Sobieszczanski-Sobieski. 2006. Parallel particle swarm optimization algorithm accelerated by asynchronous evaluations. Journal of Aerospace Computing, Information, and Communication 3, 3(2006), 123–137.Google ScholarCross Ref
- Vasily Volkov. 2010. Better performance at lower occupancy. In Proceedings of the GPU Technology Conference, GTC, Vol. 10. San Jose, CA, 16.Google Scholar
- Mark P Wachowiak, Mitchell C Timson, and David J DuVal. 2017. Adaptive particle swarm optimization with heterogeneous multicore parallelism and GPU acceleration. IEEE Transactions on Parallel and Distributed Systems 28, 10 (2017), 2784–2793.Google ScholarDigital Library
- Zeyi Wen, Hanfeng Liu, Jiashuai Shi, Qinbin Li, Bingsheng He, and Jian Chen. 2020. ThunderGBM: Fast GBDTs and Random Forests on GPUs. Journal of Machine Learning Research 21, 108 (2020), 1–5.Google Scholar
Recommendations
Theory of swarm intelligence
GECCO Comp '14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary ComputationSocial animals as found in fish schools, bird flocks, bee hives, and ant colonies are able to solve highly complex problems in nature. This includes foraging for food, constructing astonishingly complex nests, and evading or defending against predators. ...
An Improved Particle Swarm Algorithm for Search Optimization
GCIS '09: Proceedings of the 2009 WRI Global Congress on Intelligent Systems - Volume 01To address the problem of space locus searching, a slowdown particle swarm optimization (SPSO) is proposed to improve the convergence performance of particle swarm from the position viewpoint. The particle swarm in SPSO is divided into many independent ...
Theory of Swarm Intelligence
GECCO Companion '15: Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary ComputationSocial animals as found in fish schools, bird flocks, bee hives, and ant colonies are able to solve highly complex problems in nature. This includes foraging for food, constructing astonishingly complex nests, and evading or defending against predators. ...
Comments