Mastering construction heuristics with self-play deep reinforcement learning

Wang, Qi; He, Yuqing; Tang, Chunlei

doi:10.1007/s00521-022-07989-6

Mastering construction heuristics with self-play deep reinforcement learning

Original Article
Published: 29 October 2022

Volume 35, pages 4723–4738, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

710 Accesses
Explore all metrics

Abstract

Learning heuristics without expert experience to construct solutions automatically has always been a critical challenge of combinatorial optimization. It is also the pursuit of artificial intelligence to construct an agent with the planning ability to solve multiple problems simultaneously. Nonetheless, most current learning-based methods for combinatorial optimization still rely on artificially designed heuristics. In real-world problems, the environment’s dynamics are often unknown and complex, making it challenging to generalize and implement current methods. Inspired by AlphaGo Zero, we propose a novel self-play reinforcement learning algorithm (CH-Zero) based on the Monte Carlo tree search (MCTS) for routing optimization problems in this paper. Like AlphaGo Zero, CH-Zero does not require expert experience but some necessary rules. However, unlike other self-play algorithms based on MCTS, we have designed offline training and online reasoning. Specifically, we apply self-play reinforcement learning without MCTS to train offline policy and value networks. Then, we apply the learned heuristics and neural network combined with an MCTS to make inferences on unknown instances. Since we did not incorporate MCTS during training, this is equivalent to training a lightweight self-playing framework whose learning efficiency is much higher than the existing self-play-based methods for combinatorial optimization. We can employ the learned heuristics to guide MCTS to improve policies and take better actions at runtime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

Article 14 August 2023

A deep reinforcement learning framework for solving two-stage stochastic programs

Article 31 May 2023

Learning and fine-tuning a generic value-selection heuristic inside a constraint programming solver

Article Open access 23 November 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

References

Prügel-Bennett A, Tayarani-Najaran MH (2012) Maximum satisfiability: anatomy of the fitness landscape for a hard combinatorial optimization problem. IEEE Trans Evol Comput 16:319–338. https://doi.org/10.1109/TEVC.2011.2163638
Article Google Scholar
Hernando L, Mendiburu A, Lozano JA (2016) A tunable generator of instances of permutation-based combinatorial optimization problems. IEEE Trans Evol Comput 20:165–179. https://doi.org/10.1109/TEVC.2015.2433680
Article Google Scholar
Garey MR, Johnson DS (1979) Computers, Complexity, and Intractability. A Guid. to Theory NPCompleteness, vol 115
Xu X, Li J, Zhou MC (2021) Delaunay-triangulation-based variable neighborhood search to solve large-scale general colored traveling salesman problems. IEEE Trans Intell Transp Syst 22:1583–1593. https://doi.org/10.1109/TITS.2020.2972389
Article Google Scholar
Rokbani N, Kumar R, Abraham A, Alimi AM, Long HV, Priyadarshini I, Son LH (2021) Bi-heuristic ant colony optimization-based approaches for traveling salesman problem. Soft Comput 25:3775–3794. https://doi.org/10.1007/s00500-020-05406-5
Article Google Scholar
Yu JJQ, Yu W, Gu J (2019) Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans Intell Transp Syst 20:3806–3817. https://doi.org/10.1109/TITS.2019.2909109
Article Google Scholar
Kim G, Ong YS, Heng CK, Tan PS, Zhang NA (2015) City vehicle routing problem (city VRP): a review. IEEE Trans Intell Transp Syst 16:1654–1666. https://doi.org/10.1109/TITS.2015.2395536
Article Google Scholar
Goyal S (2010) A survey on travelling salesman problem. In: Midwest instruction and computing symposium, pp 1–9
Arnold F, Gendreau M, Sörensen K (2019) Efficiently solving very large-scale routing problems. Comput Oper Res 107:32–42. https://doi.org/10.1016/j.cor.2019.03.006
Article MATH Google Scholar
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539
Article Google Scholar
Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580–586. https://doi.org/10.1038/s41586-020-03157-9
Article Google Scholar
Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2021.107526
Article Google Scholar
Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans Comput Intell AI Games 4:1–43. https://doi.org/10.1109/TCIAIG.2012.2186810
Article Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-019-1724-z
Article Google Scholar
Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, Guez A, Lockhart E, Hassabis D, Graepel T, Lillicrap T, Silver D (2020) Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588:604–609. https://doi.org/10.1038/s41586-020-03051-4
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Van Den Driessche G, Graepel T, Hassabis D (2017) Mastering the game of Go without human knowledge. Nature 550:354–359. https://doi.org/10.1038/nature24270
Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2018) A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (80-.) 362:1140–1144. https://doi.org/10.1126/science.aar6404
Article MATH Google Scholar
Huang Y (2020) Deep Q-networks. Deep Reinf Learn Fundam Res Appl. https://doi.org/10.1007/978-981-15-4095-0_4
Article Google Scholar
Wang Q, Tang C (2021) Deep reinforcement learning for transportation network combinatorial optimization: a survey. Knowl-Based Syst 233:107526. https://doi.org/10.1016/j.knosys.2021.107526
Article Google Scholar
Wang Q (2021) VARL: a variational autoencoder-based reinforcement learning Framework for vehicle routing problems. Appl Intell. https://doi.org/10.1007/s10489-021-02920-3
Article Google Scholar
Bengio Y, Lodi A, Prouvost A (2021) Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur J Oper Res 290:405–421. https://doi.org/10.1016/j.ejor.2020.07.063
Article MATH Google Scholar
Lodi A, Zarpellon G (2017) On learning and branching: a survey. TOP 25:207–236236. https://doi.org/10.1007/s11750-017-0451-6
Article MATH Google Scholar
Toth P, Vigo D (2002) Models, relaxations and exact approaches for the capacitated vehicle routing problem. Discret Appl Math 123:487–512. https://doi.org/10.1016/S0166-218X(01)00351-1
Article MATH Google Scholar
Gasse M, Chételat D, Ferroni N, Charlin L, Lodi A (2019) Exact combinatorial optimization with graph convolutional neural networks
Ene A, Nagarajan V, Saket R (20180 Approximation algorithms for stochastic k-TSP. In: Leibniz International Proceedings in Informatics, LIPIcs, vol 93, pp 1–11. https://doi.org/10.4230/LIPIcs.FSTTCS.2017.27
Sato R, Yamada M, Kashima H (2019) Approximation ratios of graph neural networks for combinatorial problems. Adv Neural Inf Process Syst 32:1–15
Google Scholar
Sheldon F, Cicotti P, Traversa FL, Di Ventra M (2020) Stress-testing memcomputing on hard combinatorial optimization problems. IEEE Trans Neural Netw Learn Syst 31:2222–2226. https://doi.org/10.1109/TNNLS.2019.2927480
Article Google Scholar
Kumar SN, Panneerselvam R (2012) A survey on the vehicle routing problem and its variants. Intell Inf Manag 04:66–74. https://doi.org/10.4236/iim.2012.43010
Article Google Scholar
Helsgaun K (2000) Effective implementation of the Lin–Kernighan traveling salesman heuristic.https://doi.org/10.1016/S0377-2217(99)00284-2
Zheng J, He K, Zhou J, Jin Y, Li C-M (2020) Combining reinforcement learning with Lin–Kernighan–Helsgaun algorithm for the traveling salesman problem. In: Proceedings of the AAAI conference on artificial intelligence
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 4:3104–3112
Google Scholar
Vinyals O, Fortunato M, Jaitly N (2015) Pointer networks. Adv Neural Inf Process Syst 28:2692–2700
Google Scholar
Bello I, Pham H, Le QV, Norouzi M, Bengio S (2019) Neural combinatorial optimization with reinforcement learning. In: 5th international conference on learning representations, ICLR 2017—workshop track proceedings, pp 1–15
Ivanov S, D’yakonov A (2019) Modern deep reinforcement learning algorithms
Nazari M, Oroojlooy A, Takáč M, Snyder LV (2018) Reinforcement learning for solving the vehicle routing problem. Adv Neural Inf Process Syst 31:9839–9849
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5999–6009
Google Scholar
Kool W Van Hoof H, Welling M (2019) Attention, learn to solve routing problems! In: 7th International conference on learning representations. ICLR 2019, pp 1–25
Veličković P, Casanova A, Liò P, Cucurull G, Romero A, Bengio Y (2018) Graph attention networks. In: 6th International conference on learning representations. ICLR 2018—conference track proceedings, pp 1–12
Battaglia PW, Hamrick JB, Bapst V, Sanchez-Gonzalez A, Zambaldi V, Malinowski M, Tacchetti A, Raposo D, Santoro A, Faulkner R, Gulcehre C, Song F, Ballard A, Gilmer J, Dahl G, Vaswani A, Allen K, Nash C, Langston V, Dyer C, Heess N, Wierstra D, Kohli P, Botvinick M, Vinyals O, Li Y, Pascanu R (2018) Relational inductive biases, deep learning, and graph networks, pp 1–38
Xu K, Jegelka S, Hu W, Leskovec J (2019) How powerful are graph neural networks? In: 7th International conference on learning representations. ICLR 2019
Cui P, Wang X, Pei J, Zhu W (2019) A survey on network embedding. IEEE Trans Knowl Data Eng 31:833–852. https://doi.org/10.1109/TKDE.2018.2849727
Article Google Scholar
Dai H, Khalil EB, Zhang Y, Dilkina B, Song L (2017) Learning combinatorial optimization algorithms over graphs. Adv Neural Inf Process Syst 30:6349–6359
Google Scholar
Wu F, Zhang T, de Souza AH, Fifty C, Yu T, Weinberger KQ (2019) Simplifying graph convolutional networks. In: 36th International conference on machine learning. ICML 2019. 2019-June, pp 11884–11894
Li Z, Chen Q, Koltun V (2018) Combinatorial optimization with graph convolutional networks and guided tree search. Adv Neural Inf Process Syst 31:539–548
Google Scholar
Manchanda S, Mittal A, Dhawan A, Medya S, Ranu S, Singh A (2019) Learning heuristics over large graphs via deep reinforcement learning. Assoc Adv Artif Intell
Joshi CK, Laurent T, Bresson X (2019) An efficient graph convolutional network technique for the travelling salesman problem, pp 1–17
Drori I, Kharkar A, Sickinger WR, Kates B, Ma Q, Ge S, Dolev E, Dietrich B, Williamson DP, Udell M (2020) Learning to solve combinatorial optimization problems on real-world graphs in linear time. In: Proceedings—19th IEEE international conference on machine learning and applications. ICMLA 2020, pp 19–24. https://doi.org/10.1109/ICMLA51294.2020.00013
Duan L, Zhan Y, Hu H, Gong Y, Wei J, Zhang X, Xu Y (2020) Efficiently solving the practical vehicle routing problem: a novel joint learning approach. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 3054–3063 https://doi.org/10.1145/3394486.3403356
Ma Q, Ge S, He D, Thaker D, Drori I (2019) Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning
Lu H, Zhang X, Yang S (2020) A Learning-based iterative method for solving vehicle routing problems. ICLR 3:1–15
Google Scholar
Huang J, Patwary M, Diamos G (2019) Coloring big graphs with AlphaGoZero
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D (2017) Mastering chess and shogi by self-play with a general reinforcement learning algorithm, pp 1–19
Laterre A, Fu Y, Jabri MK, Cohen A-S, Kas D, Hajjar K, Dahl TS, Kerkeni A, Beguir K (2018) Ranked reward: enabling self-play reinforcement learning for combinatorial optimization
Abe K, Xu Z, Sato I, Sugiyama M (2019) Solving NP-hard problems on graphs with extended AlphaGo Zero, pp 1–23
Zhang Z, Cui P, Zhu W (2020) Deep learning on graphs: a survey. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/tkde.2020.2981333
Article Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Adv Neural Inf Process Syst 31:4863–4873
Google Scholar
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium—technical report. FS-15-06, pp 29–37
Anthony T, Tian Z, Barber D (2017) Thinking fast and slow with deep learning and tree search. Adv Neural Inf Process Syst 30:5361–5371
Google Scholar
Wu TR, Wei TH, Wu IC (2020) Accelerating and improving alphazero using population based training. https://doi.org/10.1609/aaai.v34i01.5454

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Qi Wang
Institute for Data Industry, Fudan University, Shanghai, China
Yuqing He
Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
Chunlei Tang

Authors

Qi Wang
View author publications
You can also search for this author inPubMed Google Scholar
Yuqing He
View author publications
You can also search for this author inPubMed Google Scholar
Chunlei Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qi Wang.

Ethics declarations

Conflict of interest

We wish to confirm no known conflicts of interest associated with this publication. There has been no significant financial support for this work that could have influenced its outcome. We confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that all have approved the order of authors listed in our manuscript. We confirm that we have given due consideration to the protection of intellectual property associated with this work. There are no impediments to publication, including the timing of publication, concerning intellectual property. In so doing, we confirm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). They are responsible for communicating with the other authors about progress, submissions of revisions, and final approval of proofs. We confirm that we have provided a current, correct email address accessible by the Corresponding Author and configured to accept email from 17110240039@fudan.edu.cn.

Ethical approval

Our study did not raise any ethical questions, i.e., none of the subjects were humans or living individuals. This paper only focuses on combinatorial optimization of graphs in computer science. The technologies used are all modern computer technologies, including deep learning, reinforcement learning, and Monte Carlo tree search. Our research belongs to theoretical and application innovation in computer science, so it does not involve ethical and moral issues.

Informed consent

All authors are aware of this article and agree to its submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Q., He, Y. & Tang, C. Mastering construction heuristics with self-play deep reinforcement learning. Neural Comput & Applic 35, 4723–4738 (2023). https://doi.org/10.1007/s00521-022-07989-6

Download citation

Received: 08 December 2021
Accepted: 20 October 2022
Published: 29 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00521-022-07989-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mastering construction heuristics with self-play deep reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Routing optimization with Monte Carlo Tree Search-based multi-agent reinforcement learning

A deep reinforcement learning framework for solving two-stage stochastic programs

Learning and fine-tuning a generic value-selection heuristic inside a constraint programming solver

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now