Skip to main content

Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation

  • Conference paper
  • First Online:
Learning and Intelligent Optimization (LION 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14286))

Included in the following conference series:

Abstract

Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for single player games and optimization problems. In this paper we propose to modify GNRPA in order to automatically learn the bias weights. The goal is both to obtain better results on sets of dissimilar instances, and also to avoid some hyperparameters settings. Experiments show that it improves the algorithm for two different optimization problems: the Vehicle Routing Problem and 3D Bin Packing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdo, A., Edelkamp, S., Lawo, M.: Nested rollout policy adaptation for optimizing vehicle selection in complex VRPs, pp. 213–221 (2016)

    Google Scholar 

  2. Bouzy, B., Cazenave, T.: Computer go: an AI oriented survey. Artif. Intell. 132(1), 39–103 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  3. Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  4. Brügmann, B.: Monte Carlo Go. Max-Planke-Inst. Phys., Munich, Technical report (1993)

    Google Scholar 

  5. Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)

    Google Scholar 

  6. Cazenave, T.: Generalized nested rollout policy adaptation. In: Monte Carlo Search at IJCAI (2020)

    Google Scholar 

  7. Cazenave, T., Fournier, T.: Monte Carlo inverse folding. In: Monte Carlo Search at IJCAI (2020)

    Google Scholar 

  8. Cazenave, T., Lucas, J.Y., Kim, H., Triboulet, T.: Monte Carlo vehicle routing. In: ATT at ECAI (2020)

    Google Scholar 

  9. Cazenave, T., Lucas, J.Y., Triboulet, T., Kim, H.: Policy adaptation for vehicle routing. AI Commun. 34, 21–35 (2021)

    Article  MathSciNet  Google Scholar 

  10. Cazenave, T., Teytaud, F.: Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 42–54. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34413-8_4

    Chapter  Google Scholar 

  11. Cornu, M.L: Local search, data structures and Monte Carlo search for multi-objective combinatorial optimization problems. (recherche locale, structures de données et recherche Monte-carlo pour les problèmes d’optimisation combinatoire multi-objectif) (2017)

    Google Scholar 

  12. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7

    Chapter  Google Scholar 

  13. Dang, C., Bazgan, C., Cazenave, T., Chopin, M., Wuillemin, P.-H.: Monte Carlo search algorithms for network traffic engineering. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 486–501. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_30

    Chapter  Google Scholar 

  14. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manage. Sci. 6(1), 80–91 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  15. Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: Computational Intelligence in Scheduling (SCIS), 2013 IEEE Symposium on, pp. 44–51. IEEE (2013)

    Google Scholar 

  16. Edelkamp, S., Gath, M., Greulich, C., Humann, M., Herzog, O., Lawo, M.: Monte-Carlo tree search for logistics. In: Clausen, U., Friedrich, H., Thaller, C., Geiger, C. (eds.) Commercial Transport. LNL, pp. 427–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21266-1_28

    Chapter  Google Scholar 

  17. Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS (LNAI), vol. 8736, pp. 285–296. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11206-0_28

    Chapter  Google Scholar 

  18. Graf, T., Platzner, M.: Adaptive playouts in Monte-Carlo tree search with policy-gradient reinforcement learning. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 1–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27992-3_1

    Chapter  Google Scholar 

  19. Hu, H., Zhang, X., Yan, X., Wang, L., Xu, Y.: Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv:1708.05930 (2017)

  20. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29

    Chapter  Google Scholar 

  21. Maniezzo, V., Gambardella, L.M., de Luigi, F.: Ant colony optimization. In: New Optimization Techniques in Engineering. Studies in Fuzziness and Soft Computing, vol. 141, pp. 101–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-39930-8_5

  22. Méhat, J., Cazenave, T.: Combining UCT and nested Monte Carlo Search for single-player general game playing. IEEE Trans. Comput. Intell. AI Games 2(4), 271–277 (2010)

    Article  Google Scholar 

  23. Portela, F.: An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv, p. 345587 (2018)

    Google Scholar 

  24. Qi, C., Sun, Y.: An improved ant colony algorithm for VRPTW. In: 2008 International Conference on Computer Science and Software Engineering, vol. 1, pp. 455–458. IEEE (2008)

    Google Scholar 

  25. Rizzoli, A.E., Montemanni, R., Lucibello, E., Gambardella, L.M.. Ant colony optimization for real-world vehicle routing problems. Swarm Intell. 1(2), 135–151 (2007)

    Google Scholar 

  26. Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo Tree Search. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 649–654 (2011)

    Google Scholar 

  27. Sentuc, J., Cazenave, T., Lucas, J.Y.: Generalized nested rollout policy adaptation with dynamic bias for vehicle routing. In: AI for Transportation at AAAI (2022)

    Google Scholar 

  28. Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. In: Operations Research (1985)

    Google Scholar 

  29. Wang, F., Hauser, K.: Stable bin packing of non-convex 3D objects with a robot manipulator. arXiv:1812.04093v1 (2018)

  30. Zhao, H., Yu, Y., Xu, K.: Learning efficient online 3D bin packing on packing configuration trees. In: International Conference on Learning Representations (2022)

    Google Scholar 

  31. Zhen, T., Zhang, Q., Zhang, W., Ma, Z.: Hybrid ant colony algorithm for the vehicle routing with time windows. In: 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, vol. 1, pp. 8–12. IEEE (2008)

    Google Scholar 

Download references

Acknowledgment

Thanks to Clément Royer for advising us to use a gradient when possible. This work was supported in part by the French government under the management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tristan Cazenave .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sentuc, J., Ellouze, F., Lucas, JY., Cazenave, T. (2023). Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation. In: Sellmann, M., Tierney, K. (eds) Learning and Intelligent Optimization. LION 2023. Lecture Notes in Computer Science, vol 14286. Springer, Cham. https://doi.org/10.1007/978-3-031-44505-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44505-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44504-0

  • Online ISBN: 978-3-031-44505-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics