Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning

Zhang, Yunchao; Liao, Kewen; Liao, Zhibin; Guo, Longkun

doi:10.1007/978-981-97-2253-2_26

Yunchao Zhang¹³,
Kewen Liao ORCID: orcid.org/0000-0003-0371-6525¹⁴,
Zhibin Liao¹⁵ &
…
Longkun Guo ORCID: orcid.org/0000-0003-2891-4253^13,16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14646))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

810 Accesses

Abstract

The use of deep reinforcement learning (DRL) techniques to solve classical combinatorial optimization problems like the Traveling Salesman Problem (TSP) has garnered considerable attention due to its advantage of flexible and fast model-based inference. However, DRL training often suffers low efficiency and scalability, which hinders model generalization. This paper proposes a simple yet effective pre-training method that utilizes behavior cloning to initialize neural network parameters for policy gradient DRL. To alleviate the need for large amounts of demonstrations in behavior cloning, we exploit the symmetry of TSP solutions for augmentation. Our method is demonstrated by enhancing the state-of-the-art policy gradient models Attention and POMO for the TSP. Experimental results show that the optimality gap of the solution is significantly reduced while the DRL training time is greatly shortened. This also enables effective and efficient solving of larger TSP instances.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Solving Traveling Salesman Problem with Deep Reinforcement Learning and Knowledge Distillation

Dy-Drl2Op: Learning Heuristics for TSP on the Dynamic Graph via Deep Reinforcement Learning

Deep reinforcement learning combined with transformer to solve the traveling salesman problem

Article 12 November 2024

References

Aarts, E.H., Lenstra, J.K.: Local search in combinatorial optimization. Princeton University Press (2003)
Google Scholar
Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962)
Article MathSciNet Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Article MathSciNet Google Scholar
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)
Google Scholar
Applegate, D., Robert Bixby, V.C., Cook, W.: Concorde TSP Solver (2006). https://www.math.uwaterloo.ca/tsp/concorde/index.html
Halim, A.H., Ismail, I.: Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26, 367–380 (2019)
Article MathSciNet Google Scholar
Helsgaun, K.: An extension of the lin-kernighan-helsgaun TSP solver for constrained traveling salesman and vehicle routing problems: Technical report (2017)
Google Scholar
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)
Article Google Scholar
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. Adv. Neural. Inf. Process. Syst. 30 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)
Google Scholar
Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21188–21198 (2020)
Google Scholar
Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966)
Article MathSciNet Google Scholar
Ma, Q., Ge, S., He, D., Thaker, D., Drori, I.: Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (2020)
Google Scholar
Matai, R., Singh, S.P., Mittal, M.L.: Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling Salesman Problem, Theory and Applications 1 (2010)
Google Scholar
d O Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)
Google Scholar
Papadimitriou, C.H.: The euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)
Article MathSciNet Google Scholar
Perron, L., Furnon, V.: Or-tools (2022). https://developers.google.com/optimization/
Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Adv. Neural. Inf. Process. Syst. 1 (1988)
Google Scholar
Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania (June 2018)
Google Scholar
Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32
Chapter Google Scholar
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27 (2014)
Google Scholar
Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103–103 (2000)
Google Scholar
Torabi, F., Warnell, G., Stone, P.: Recent advances in imitation learning from observation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6324–6331 (2019)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural. Inf. Process. Syst. 28 (2015)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforc. Learn., 5–32 (1992)
Google Scholar
Williamson, D.P., Shmoys, D.B.: The design of approximation algorithms. Cambridge University Press (2011)
Google Scholar
Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12042–12049 (2021)
Google Scholar
Xin, L., Song, W., Cao, Z., Zhang, J.: Step-wise deep learning models for solving routing problems. IEEE Trans. Industr. Inf. 17(7), 4861–4871 (2021)
Article Google Scholar
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)
Google Scholar
Yang, H., Gu, M.: A new baseline of policy gradient for traveling salesman problem. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–7. IEEE (2022)
Google Scholar
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33 (2020)
Google Scholar

Download references

Acknowledgments

This work is supported by the Taishan Scholars Young Expert Project of Shandong Province (No. tsqn202211215) and the National Science Foundation of China (Nos. 12271098 and 61772005). The corresponding author is Longkun Guo (lkguo@fzu.edu.cn).

Author information

Authors and Affiliations

School of Computer Science, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250316, China
Yunchao Zhang & Longkun Guo
HilstLab, Peter Faber Business School, Australian Catholic University, North Sydney, 2060, Australia
Kewen Liao
Australian Institute for Machine Learning, University of Adelaide, Adelaide, 5005, Australia
Zhibin Liao
School of Mathematics and Statistics, Fuzhou University, Fuzhou, 350116, China
Longkun Guo

Authors

Yunchao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kewen Liao
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Longkun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Longkun Guo .

Editor information

Editors and Affiliations

Taipei, Taiwan
De-Nian Yang
Microsoft Research Asia, Beijing, China
Xing Xie
National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Vincent S. Tseng
Duke University, Durham, NC, USA
Jian Pei
National Cheng Kung University, Tainan, Taiwan
Jen-Wei Huang
Silesian University of Technology, Gliwice, Poland
Jerry Chun-Wei Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Liao, K., Liao, Z., Guo, L. (2024). Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_26

Download citation

DOI: https://doi.org/10.1007/978-981-97-2253-2_26
Published: 25 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning