Abstract
The use of deep reinforcement learning (DRL) techniques to solve classical combinatorial optimization problems like the Traveling Salesman Problem (TSP) has garnered considerable attention due to its advantage of flexible and fast model-based inference. However, DRL training often suffers low efficiency and scalability, which hinders model generalization. This paper proposes a simple yet effective pre-training method that utilizes behavior cloning to initialize neural network parameters for policy gradient DRL. To alleviate the need for large amounts of demonstrations in behavior cloning, we exploit the symmetry of TSP solutions for augmentation. Our method is demonstrated by enhancing the state-of-the-art policy gradient models Attention and POMO for the TSP. Experimental results show that the optimality gap of the solution is significantly reduced while the DRL training time is greatly shortened. This also enables effective and efficient solving of larger TSP instances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aarts, E.H., Lenstra, J.K.: Local search in combinatorial optimization. Princeton University Press (2003)
Bellman, R.: Dynamic programming treatment of the travelling salesman problem. J. ACM (JACM) 9(1), 61–63 (1962)
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: Proceedings of International Conference on Learning Representations (ICLR) (2017)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: International Conference on Machine Learning, pp. 2702–2711. PMLR (2016)
Applegate, D., Robert Bixby, V.C., Cook, W.: Concorde TSP Solver (2006). https://www.math.uwaterloo.ca/tsp/concorde/index.html
Halim, A.H., Ismail, I.: Combinatorial optimization: comparison of heuristic algorithms in travelling salesman problem. Arch. Comput. Methods Eng. 26, 367–380 (2019)
Helsgaun, K.: An extension of the lin-kernighan-helsgaun TSP solver for constrained traveling salesman and vehicle routing problems: Technical report (2017)
Hussein, A., Gaber, M.M., Elyan, E., Jayne, C.: Imitation learning: a survey of learning methods. ACM Comput. Surv. (CSUR) 50(2), 1–35 (2017)
Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. Adv. Neural. Inf. Process. Syst. 30 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! In: International Conference on Learning Representations (2019)
Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: Pomo: Policy optimization with multiple optima for reinforcement learning. Adv. Neural. Inf. Process. Syst. 33, 21188–21198 (2020)
Lawler, E.L., Wood, D.E.: Branch-and-bound methods: a survey. Oper. Res. 14(4), 699–719 (1966)
Ma, Q., Ge, S., He, D., Thaker, D., Drori, I.: Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning. In: AAAI Workshop on Deep Learning on Graphs: Methodologies and Applications (2020)
Matai, R., Singh, S.P., Mittal, M.L.: Traveling salesman problem: an overview of applications, formulations, and solution approaches. Traveling Salesman Problem, Theory and Applications 1 (2010)
d O Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)
Papadimitriou, C.H.: The euclidean travelling salesman problem is np-complete. Theoret. Comput. Sci. 4(3), 237–244 (1977)
Perron, L., Furnon, V.: Or-tools (2022). https://developers.google.com/optimization/
Pomerleau, D.A.: Alvinn: an autonomous land vehicle in a neural network. Adv. Neural. Inf. Process. Syst. 1 (1988)
Rajeswaran, A., et al.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In: Proceedings of Robotics: Science and Systems. Pittsburgh, Pennsylvania (June 2018)
Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 317–328. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_32
Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635. JMLR Workshop and Conference Proceedings (2011)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural. Inf. Process. Syst. 27 (2014)
Thrun, S., Littman, M.L.: Reinforcement learning: an introduction. AI Mag. 21(1), 103–103 (2000)
Torabi, F., Warnell, G., Stone, P.: Recent advances in imitation learning from observation. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 6324–6331 (2019)
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural. Inf. Process. Syst. 28 (2015)
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforc. Learn., 5–32 (1992)
Williamson, D.P., Shmoys, D.B.: The design of approximation algorithms. Cambridge University Press (2011)
Xin, L., Song, W., Cao, Z., Zhang, J.: Multi-decoder attention model with embedding glimpse for solving vehicle routing problems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 12042–12049 (2021)
Xin, L., Song, W., Cao, Z., Zhang, J.: Step-wise deep learning models for solving routing problems. IEEE Trans. Industr. Inf. 17(7), 4861–4871 (2021)
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)
Yang, H., Gu, M.: A new baseline of policy gradient for traveling salesman problem. In: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–7. IEEE (2022)
Zaheer, M., et al.: Big bird: transformers for longer sequences. Adv. Neural. Inf. Process. Syst. 33 (2020)
Acknowledgments
This work is supported by the Taishan Scholars Young Expert Project of Shandong Province (No. tsqn202211215) and the National Science Foundation of China (Nos. 12271098 and 61772005). The corresponding author is Longkun Guo (lkguo@fzu.edu.cn).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Y., Liao, K., Liao, Z., Guo, L. (2024). Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior Cloning. In: Yang, DN., Xie, X., Tseng, V.S., Pei, J., Huang, JW., Lin, J.CW. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2024. Lecture Notes in Computer Science(), vol 14646. Springer, Singapore. https://doi.org/10.1007/978-981-97-2253-2_26
Download citation
DOI: https://doi.org/10.1007/978-981-97-2253-2_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2252-5
Online ISBN: 978-981-97-2253-2
eBook Packages: Computer ScienceComputer Science (R0)