A Deep Reinforcement Learning Algorithm Using A New Graph Transformer Model for Routing Problems

Wang, Yang; Chen, Zhibin

doi:10.1007/978-3-031-16075-2_26

Yang Wang¹⁰ &
Zhibin Chen¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 544))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

916 Accesses
1 Citations

Abstract

Routing problems, which belong to a classical kind of problem in combinatorial optimization, have been extensively studied for many decades by researchers from different backgrounds. In recent years, Deep Reinforcement Learning (DRL) has been applied widely in self-driving, robotics, industrial automation, video games, and other fields, showing its strong decision-making and learning ability. In this paper, we propose a new graph transformer model, based on the DRL algorithm, for minimizing the route lengths of a given routing problem. Specifically, the actor-network parameters are trained by an improved REINFORCE algorithm to effectively reduce the variance and adjust the frequency of the reward values. Further, positional encoding is used in the encoding structure to make the multiple nodes satisfy translation invariance during the embedding process and enhance the stability of the model. The aggregate operation of the graph neural network applies to transformer model decoding stage at this time, which effectively captures the topological structure of the graph and the potential relationships between nodes. We have used our model to two classical routing problems, i.e., Traveling Salesman Problem (TSP) and Capacitate Vehicle Routing Problem (CVRP). The experimental results show that the optimization effect of our model on small and medium-sized TSP and CVRP surpasses the state-of-the-art DRL-based methods and some traditional algorithms. Meanwhile, this model also provides an effective strategy for solving combinatorial optimization problems on graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial Optimization. Wiley, New York (2010)
Google Scholar
Bellmore, M., Nemhauser, G.L.: The traveling salesman problem: a survey. Oper. Res. 16(3), 538–558 (1968)
Article MathSciNet Google Scholar
Ritzinger, U., Puchinger, J., Hartl, R.F.: A survey on dynamic and stochastic vehicle routing problems. Int. J. Prod. Res. 54(1), 215–231 (2016)
Article Google Scholar
Papadimitriou, C.H.: The Euclidean travelling salesman problem is NP-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)
Article MathSciNet Google Scholar
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’Horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Article MathSciNet Google Scholar
Wang, Q., Tang, C.: Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl.-Based Syst. 233, 107526 (2021)
Google Scholar
Vesselinova, N., Steinert, R., Perez-Ramirez, D.F., Boman, M.: Learning combinatorial optimization on graphs: a survey with applications to networking. IEEE Access 8, 120388–120416 (2020)
Article Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: 31st International Conference on Neural Information Processing Systems, pp. 5998–6008. MIT Press, Cambridge (2017)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist rein-forcement learning. Mach. Learn. 8(3), 229–256 (1992)
MATH Google Scholar
Gu, Q., Wang, Q., Li, X., Li, X.: A surrogate-assisted multi-objective particle swarm optimization of expensive constrained combinatorial optimization problems. Knowl.-Based Syst. 223, 107049 (2021)
Google Scholar
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001). https://doi.org/10.1007/978-3-662-04565-7
Hamzadayı, A., Baykasoğlu, A., Akpınar, S.: Solving combinatorial optimization problems with single seekers society algorithm. Knowl.-Based Syst. 201, 106036 (2020)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: 29th Neural Information Processing System, pp. 2692–2700. MIT Press, Cambridge (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: 29th Neural Information Processing System, pp. 3104–3112. MIT Press, Cambridge (2014)
Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940 (2016)
Nazari, M., Oroojlooy, A., Snyder, L.V., Takáč, M.: Reinforcement learning for solving the vehicle routing problem. In: 32th Neural Information Processing System, pp. 9839–9849. MIT Press, Cambridge (2018)
Google Scholar
Dai, H., Khalil, E.B., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: 31th Neural Information Processing System, pp. 6351–6361. MIT Press, Cambridge (2017)
Google Scholar
Kool, W., Van Hoof, H., Welling, M.: Attention, learn to solve routing problems. arXiv preprint arXiv:1803.08475 (2018)
Peng, B., Wang, J., Zhang, Z.: A deep reinforcement learning algorithm using dynamic attention model for vehicle routing problems. In: Li, K., Li, W., Wang, H., Liu, Y. (eds.) ISICA 2019. CCIS, vol. 1205, pp. 636–650. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-5577-0_51
Chapter Google Scholar
Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: 33th Neural Information Processing System, pp. 6281–6292. MIT Press, Cambridge (2019)
Google Scholar
Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: POMO: policy optimization with multiple optima for reinforcement learning. arXiv preprint arXiv:2010.16011 (2020)
Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2021)
Google Scholar
Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: 35th AAAI Conference on Artificial Intelligence, pp. 12042–12049, Menlo Park, CA (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 34th IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Piscataway, NJ (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32th International Conference on Machine Learning, pp. 448–456, New York, NY (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Applegate, D.L., et al.: Certification of an optimal tsp tour through 85,900 cities. Oper. Res. Lett. 37(1), 11–15 (2009)
Article MathSciNet Google Scholar
Helsgaun, K.: An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Roskilde University, Roskilde (2017)
Google Scholar

Download references

Acknowledgment

This work is supported in part by the National Natural Science Foundation of China (11761042). Moreover, we thank Kool et al. [19] and Kwon et al. [22] for sharing their source code, which served as initial basis for our work.

Author information

Authors and Affiliations

The Department of Mathematics, Kunming University of Science and Technology, 650093, Kunming, People’s Republic of China
Yang Wang & Zhibin Chen

Authors

Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhibin Chen .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Chen, Z. (2023). A Deep Reinforcement Learning Algorithm Using A New Graph Transformer Model for Routing Problems. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 544. Springer, Cham. https://doi.org/10.1007/978-3-031-16075-2_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-16075-2_26
Published: 01 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16074-5
Online ISBN: 978-3-031-16075-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics