Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation

Sentuc, Julien; Ellouze, Farah; Lucas, Jean-Yves; Cazenave, Tristan

doi:10.1007/978-3-031-44505-7_14

Julien Sentuc⁹,
Farah Ellouze⁹,
Jean-Yves Lucas¹⁰ &
…
Tristan Cazenave⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14286))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

823 Accesses
2 Citations
1 Altmetric

Abstract

Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for single player games and optimization problems. In this paper we propose to modify GNRPA in order to automatically learn the bias weights. The goal is both to obtain better results on sets of dissimilar instances, and also to avoid some hyperparameters settings. Experiments show that it improves the algorithm for two different optimization problems: the Vehicle Routing Problem and 3D Bin Packing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Generalized Nested Rollout Policy Adaptation

Nested Rollout Policy Adaptation with Selective Policies

Learning to Solve a Stochastic Orienteering Problem with Time Windows

References

Abdo, A., Edelkamp, S., Lawo, M.: Nested rollout policy adaptation for optimizing vehicle selection in complex VRPs, pp. 213–221 (2016)
Google Scholar
Bouzy, B., Cazenave, T.: Computer go: an AI oriented survey. Artif. Intell. 132(1), 39–103 (2001)
Article MathSciNet MATH Google Scholar
Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Brügmann, B.: Monte Carlo Go. Max-Planke-Inst. Phys., Munich, Technical report (1993)
Google Scholar
Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)
Google Scholar
Cazenave, T.: Generalized nested rollout policy adaptation. In: Monte Carlo Search at IJCAI (2020)
Google Scholar
Cazenave, T., Fournier, T.: Monte Carlo inverse folding. In: Monte Carlo Search at IJCAI (2020)
Google Scholar
Cazenave, T., Lucas, J.Y., Kim, H., Triboulet, T.: Monte Carlo vehicle routing. In: ATT at ECAI (2020)
Google Scholar
Cazenave, T., Lucas, J.Y., Triboulet, T., Kim, H.: Policy adaptation for vehicle routing. AI Commun. 34, 21–35 (2021)
Article MathSciNet Google Scholar
Cazenave, T., Teytaud, F.: Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 42–54. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34413-8_4
Chapter Google Scholar
Cornu, M.L: Local search, data structures and Monte Carlo search for multi-objective combinatorial optimization problems. (recherche locale, structures de données et recherche Monte-carlo pour les problèmes d’optimisation combinatoire multi-objectif) (2017)
Google Scholar
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Chapter Google Scholar
Dang, C., Bazgan, C., Cazenave, T., Chopin, M., Wuillemin, P.-H.: Monte Carlo search algorithms for network traffic engineering. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 486–501. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_30
Chapter Google Scholar
Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manage. Sci. 6(1), 80–91 (1959)
Article MathSciNet MATH Google Scholar
Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: Computational Intelligence in Scheduling (SCIS), 2013 IEEE Symposium on, pp. 44–51. IEEE (2013)
Google Scholar
Edelkamp, S., Gath, M., Greulich, C., Humann, M., Herzog, O., Lawo, M.: Monte-Carlo tree search for logistics. In: Clausen, U., Friedrich, H., Thaller, C., Geiger, C. (eds.) Commercial Transport. LNL, pp. 427–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21266-1_28
Chapter Google Scholar
Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS (LNAI), vol. 8736, pp. 285–296. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11206-0_28
Chapter Google Scholar
Graf, T., Platzner, M.: Adaptive playouts in Monte-Carlo tree search with policy-gradient reinforcement learning. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 1–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27992-3_1
Chapter Google Scholar
Hu, H., Zhang, X., Yan, X., Wang, L., Xu, Y.: Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv:1708.05930 (2017)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Chapter Google Scholar
Maniezzo, V., Gambardella, L.M., de Luigi, F.: Ant colony optimization. In: New Optimization Techniques in Engineering. Studies in Fuzziness and Soft Computing, vol. 141, pp. 101–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-39930-8_5
Méhat, J., Cazenave, T.: Combining UCT and nested Monte Carlo Search for single-player general game playing. IEEE Trans. Comput. Intell. AI Games 2(4), 271–277 (2010)
Article Google Scholar
Portela, F.: An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv, p. 345587 (2018)
Google Scholar
Qi, C., Sun, Y.: An improved ant colony algorithm for VRPTW. In: 2008 International Conference on Computer Science and Software Engineering, vol. 1, pp. 455–458. IEEE (2008)
Google Scholar
Rizzoli, A.E., Montemanni, R., Lucibello, E., Gambardella, L.M.. Ant colony optimization for real-world vehicle routing problems. Swarm Intell. 1(2), 135–151 (2007)
Google Scholar
Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo Tree Search. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 649–654 (2011)
Google Scholar
Sentuc, J., Cazenave, T., Lucas, J.Y.: Generalized nested rollout policy adaptation with dynamic bias for vehicle routing. In: AI for Transportation at AAAI (2022)
Google Scholar
Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. In: Operations Research (1985)
Google Scholar
Wang, F., Hauser, K.: Stable bin packing of non-convex 3D objects with a robot manipulator. arXiv:1812.04093v1 (2018)
Zhao, H., Yu, Y., Xu, K.: Learning efficient online 3D bin packing on packing configuration trees. In: International Conference on Learning Representations (2022)
Google Scholar
Zhen, T., Zhang, Q., Zhang, W., Ma, Z.: Hybrid ant colony algorithm for the vehicle routing with time windows. In: 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, vol. 1, pp. 8–12. IEEE (2008)
Google Scholar

Download references

Acknowledgment

Thanks to Clément Royer for advising us to use a gradient when possible. This work was supported in part by the French government under the management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

LAMSADE, Université Paris Dauphine - PSL, CNRS, Paris, France
Julien Sentuc, Farah Ellouze & Tristan Cazenave
OSIRIS Department, EDF Lab Paris-Saclay, Electricité de France, Palaiseau, France
Jean-Yves Lucas

Authors

Julien Sentuc
View author publications
You can also search for this author in PubMed Google Scholar
Farah Ellouze
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Lucas
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Cazenave
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tristan Cazenave .

Editor information

Editors and Affiliations

InsideOpt, Dover, DE, USA
Meinolf Sellmann
Bielefeld University, Bielefeld, Germany
Kevin Tierney

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sentuc, J., Ellouze, F., Lucas, JY., Cazenave, T. (2023). Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation. In: Sellmann, M., Tierney, K. (eds) Learning and Intelligent Optimization. LION 2023. Lecture Notes in Computer Science, vol 14286. Springer, Cham. https://doi.org/10.1007/978-3-031-44505-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-44505-7_14
Published: 25 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44504-0
Online ISBN: 978-3-031-44505-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Generalized Nested Rollout Policy Adaptation

Nested Rollout Policy Adaptation with Selective Policies

Learning to Solve a Stochastic Orienteering Problem with Time Windows

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Generalized Nested Rollout Policy Adaptation

Nested Rollout Policy Adaptation with Selective Policies

Learning to Solve a Stochastic Orienteering Problem with Time Windows

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation