Abstract
Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for single player games and optimization problems. In this paper we propose to modify GNRPA in order to automatically learn the bias weights. The goal is both to obtain better results on sets of dissimilar instances, and also to avoid some hyperparameters settings. Experiments show that it improves the algorithm for two different optimization problems: the Vehicle Routing Problem and 3D Bin Packing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdo, A., Edelkamp, S., Lawo, M.: Nested rollout policy adaptation for optimizing vehicle selection in complex VRPs, pp. 213–221 (2016)
Bouzy, B., Cazenave, T.: Computer go: an AI oriented survey. Artif. Intell. 132(1), 39–103 (2001)
Browne, C., et al.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Brügmann, B.: Monte Carlo Go. Max-Planke-Inst. Phys., Munich, Technical report (1993)
Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)
Cazenave, T.: Generalized nested rollout policy adaptation. In: Monte Carlo Search at IJCAI (2020)
Cazenave, T., Fournier, T.: Monte Carlo inverse folding. In: Monte Carlo Search at IJCAI (2020)
Cazenave, T., Lucas, J.Y., Kim, H., Triboulet, T.: Monte Carlo vehicle routing. In: ATT at ECAI (2020)
Cazenave, T., Lucas, J.Y., Triboulet, T., Kim, H.: Policy adaptation for vehicle routing. AI Commun. 34, 21–35 (2021)
Cazenave, T., Teytaud, F.: Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 42–54. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34413-8_4
Cornu, M.L: Local search, data structures and Monte Carlo search for multi-objective combinatorial optimization problems. (recherche locale, structures de données et recherche Monte-carlo pour les problèmes d’optimisation combinatoire multi-objectif) (2017)
Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75538-8_7
Dang, C., Bazgan, C., Cazenave, T., Chopin, M., Wuillemin, P.-H.: Monte Carlo search algorithms for network traffic engineering. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12978, pp. 486–501. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6_30
Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manage. Sci. 6(1), 80–91 (1959)
Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: Computational Intelligence in Scheduling (SCIS), 2013 IEEE Symposium on, pp. 44–51. IEEE (2013)
Edelkamp, S., Gath, M., Greulich, C., Humann, M., Herzog, O., Lawo, M.: Monte-Carlo tree search for logistics. In: Clausen, U., Friedrich, H., Thaller, C., Geiger, C. (eds.) Commercial Transport. LNL, pp. 427–440. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-21266-1_28
Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS (LNAI), vol. 8736, pp. 285–296. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11206-0_28
Graf, T., Platzner, M.: Adaptive playouts in Monte-Carlo tree search with policy-gradient reinforcement learning. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 1–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27992-3_1
Hu, H., Zhang, X., Yan, X., Wang, L., Xu, Y.: Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv:1708.05930 (2017)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Maniezzo, V., Gambardella, L.M., de Luigi, F.: Ant colony optimization. In: New Optimization Techniques in Engineering. Studies in Fuzziness and Soft Computing, vol. 141, pp. 101–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-39930-8_5
Méhat, J., Cazenave, T.: Combining UCT and nested Monte Carlo Search for single-player general game playing. IEEE Trans. Comput. Intell. AI Games 2(4), 271–277 (2010)
Portela, F.: An unexpectedly effective Monte Carlo technique for the RNA inverse folding problem. BioRxiv, p. 345587 (2018)
Qi, C., Sun, Y.: An improved ant colony algorithm for VRPTW. In: 2008 International Conference on Computer Science and Software Engineering, vol. 1, pp. 455–458. IEEE (2008)
Rizzoli, A.E., Montemanni, R., Lucibello, E., Gambardella, L.M.. Ant colony optimization for real-world vehicle routing problems. Swarm Intell. 1(2), 135–151 (2007)
Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo Tree Search. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 649–654 (2011)
Sentuc, J., Cazenave, T., Lucas, J.Y.: Generalized nested rollout policy adaptation with dynamic bias for vehicle routing. In: AI for Transportation at AAAI (2022)
Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. In: Operations Research (1985)
Wang, F., Hauser, K.: Stable bin packing of non-convex 3D objects with a robot manipulator. arXiv:1812.04093v1 (2018)
Zhao, H., Yu, Y., Xu, K.: Learning efficient online 3D bin packing on packing configuration trees. In: International Conference on Learning Representations (2022)
Zhen, T., Zhang, Q., Zhang, W., Ma, Z.: Hybrid ant colony algorithm for the vehicle routing with time windows. In: 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, vol. 1, pp. 8–12. IEEE (2008)
Acknowledgment
Thanks to Clément Royer for advising us to use a gradient when possible. This work was supported in part by the French government under the management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sentuc, J., Ellouze, F., Lucas, JY., Cazenave, T. (2023). Learning the Bias Weights for Generalized Nested Rollout Policy Adaptation. In: Sellmann, M., Tierney, K. (eds) Learning and Intelligent Optimization. LION 2023. Lecture Notes in Computer Science, vol 14286. Springer, Cham. https://doi.org/10.1007/978-3-031-44505-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-44505-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44504-0
Online ISBN: 978-3-031-44505-7
eBook Packages: Computer ScienceComputer Science (R0)