Abstract
Routing problems, which belong to a classical kind of problem in combinatorial optimization, have been extensively studied for many decades by researchers from different backgrounds. In recent years, Deep Reinforcement Learning (DRL) has been applied widely in self-driving, robotics, industrial automation, video games, and other fields, showing its strong decision-making and learning ability. In this paper, we propose a new graph transformer model, based on the DRL algorithm, for minimizing the route lengths of a given routing problem. Specifically, the actor-network parameters are trained by an improved REINFORCE algorithm to effectively reduce the variance and adjust the frequency of the reward values. Further, positional encoding is used in the encoding structure to make the multiple nodes satisfy translation invariance during the embedding process and enhance the stability of the model. The aggregate operation of the graph neural network applies to transformer model decoding stage at this time, which effectively captures the topological structure of the graph and the potential relationships between nodes. We have used our model to two classical routing problems, i.e., Traveling Salesman Problem (TSP) and Capacitate Vehicle Routing Problem (CVRP). The experimental results show that the optimization effect of our model on small and medium-sized TSP and CVRP surpasses the state-of-the-art DRL-based methods and some traditional algorithms. Meanwhile, this model also provides an effective strategy for solving combinatorial optimization problems on graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization. Wiley, New York (2010)
Bellmore, M., Nemhauser, G.L.: The traveling salesman problem: a survey. Oper. Res. 16(3), 538–558 (1968)
Ritzinger, U., Puchinger, J., Hartl, R.F.: A survey on dynamic and stochastic vehicle routing problems. Int. J. Prod. Res. 54(1), 215–231 (2016)
Papadimitriou, C.H.: The Euclidean travelling salesman problem is NP-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’Horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Wang, Q., Tang, C.: Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl.-Based Syst. 233, 107526 (2021)
Vesselinova, N., Steinert, R., Perez-Ramirez, D.F., Boman, M.: Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8, 120388–120416 (2020)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Vaswani, A., et al.: Attention is all you need. In: 31st International Conference on Neural Information Processing Systems, pp. 5998–6008. MIT Press, Cambridge (2017)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist rein-forcement learning. Mach. Learn. 8(3), 229–256 (1992)
Gu, Q., Wang, Q., Li, X., Li, X.: A surrogate-assisted multi-objective particle swarm optimization of expensive constrained combinatorial optimization problems. Knowl.-Based Syst. 223, 107049 (2021)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001). https://doi.org/10.1007/978-3-662-04565-7
Hamzadayı, A., Baykasoğlu, A., Akpınar, S.: Solving combinatorial optimization problems with single seekers society algorithm. Knowl.-Based Syst. 201, 106036 (2020)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: 29th Neural Information Processing System, pp. 2692–2700. MIT Press, Cambridge (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: 29th Neural Information Processing System, pp. 3104–3112. MIT Press, Cambridge (2014)
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016)
Nazari, M., Oroojlooy, A., Snyder, L.V., Takáč, M.: Reinforcement learning for solving the vehicle routing problem. In: 32th Neural Information Processing System, pp. 9839–9849. MIT Press, Cambridge (2018)
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: 31th Neural Information Processing System, pp. 6351–6361. MIT Press, Cambridge (2017)
Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems. arXiv preprint arXiv:1803.08475 (2018)
Peng, B., Wang, J., Zhang, Z.: A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In: Li, K., Li, W., Wang, H., Liu, Y. (eds.) ISICA 2019. CCIS, vol. 1205, pp. 636–650. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-5577-0_51
Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: 33th Neural Information Processing System, pp. 6281–6292. MIT Press, Cambridge (2019)
Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: POMO: policy optimization with multiple optima for reinforcement learning. arXiv preprint arXiv:2010.16011 (2020)
Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2021)
Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: 35th AAAI Conference on Artificial Intelligence, pp. 12042–12049, Menlo Park, CA (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 34th IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Piscataway, NJ (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32th International Conference on Machine Learning, pp. 448–456, New York, NY (2015)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Applegate, D.L., et al.: Certification of an optimal tsp tour through 85,900 cities. Oper. Res. Lett. 37(1), 11–15 (2009)
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde University, Roskilde (2017)
Acknowledgment
This work is supported in part by the National Natural Science Foundation of China (11761042). Moreover, we thank Kool et al. [19] and Kwon et al. [22] for sharing their source code, which served as initial basis for our work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Chen, Z. (2023). A Deep Reinforcement Learning Algorithm Using A New Graph Transformer Model for Routing Problems. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 544. Springer, Cham. https://doi.org/10.1007/978-3-031-16075-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-16075-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16074-5
Online ISBN: 978-3-031-16075-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)