Improved Diversity in Nested Rollout Policy Adaptation

Edelkamp, Stefan; Cazenave, Tristan

doi:10.1007/978-3-319-46073-4_4

Stefan Edelkamp¹⁶ &
Tristan Cazenave¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9904))

Included in the following conference series:

Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz)

1291 Accesses
3 Citations

Abstract

For combinatorial search in single-player games nested Monte-Carlo search is an apparent alternative to algorithms like UCT that are applied in two-player and general games. To trade exploration with exploitation the randomized search procedure intensifies the search with increasing recursion depth. If a concise mapping from states to actions is available, the integration of policy learning yields nested rollout with policy adaptation (NRPA), while Beam-NRPA keeps a bounded number of solutions in each recursion level. In this paper we propose refinements for Beam-NRPA that improve the runtime and the solution diversity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We used one core of an Intel\(^{\textregistered {}}\) Core™ i5-2520M CPU @ 2.50 GHz \(\times \) 4. The computer has 8 GB of RAM but all invocations of the algorithm to any problem instance used less than 10 MB of main memory. Moreover, we had the following software infrastructure. Operating system: Ubuntu 14.04 LTS, Linux kernel: 3.13.0-74-generic, the compiler: g++ version 4.8.4, and the compiler options: -O3 -march=native -funroll-loops -std=c++11 -Wall.
2.
http://www.js-games.de/eng/games/samegame.
3.
https://www.sintef.no/projectweb/top/vrptw/solomon-benchmark
http://web.cba.neu.edu/~msolomon/problems.htm.
4.
The sequence of cities we found was 73, 22, 72, 54, 24, 80, 12, 0, 65, 71, 71, 20, 32, 70, 0, 92, 37, 98, 91, 16, 86, 85, 97, 13, 0, 83, 45, 61, 84, 5, 60, 89, 0, 94, 96, 99, 6, 0, 50, 33, 30, 51, 9, 67, 1, 0, 14, 44, 38, 43, 100, 95, 0, 27, 69, 76, 79, 68, 0, 52, 7, 11, 19, 49, 48, 82, 0, 28, 29, 78, 34, 35, 3, 77, 0, 62, 88, 8, 46, 17, 93, 59, 0, 36, 47, 18, 0, 39, 23, 67, 55, 4, 25, 26, 0, 63, 64, 90, 10, 31, 0, 87, 57, 2, 58, 0, 40, 53, 0, 42, 15, 41, 75, 56, 74, 21, 0.

References

Biedl, T.C., Demaine, E.D., Demaine, M.L., Fleischer, R., Jacobsen, L., Munro, J.I.: The complexity of clickomania. CoRR, cs.CC/0107031 (2001)
Google Scholar
Bouzy, B.: An experimental investigation on the pancake problem. In: Cazenave, T., Winands, M.H.M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds.) CGW 2015/GIGA 2015. CCIS, vol. 614, pp. 30–43. Springer, Heidelberg (2016). doi:10.1007/978-3-319-39402-2_3
Chapter Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2004)
Article Google Scholar
Cazenave, T.: Nested Monte-Carlo search. In: IJCAI, pp. 456–461 (2009)
Google Scholar
Cazenave, T.: Monte-Carlo beam search. IEEE Trans. Comput. Intell. AI Games 4(1), 68–72 (2012)
Article Google Scholar
Cazenave, T., Teytaud, F.: Beam nested rollout policy adaptation. In: ECAI-Workshop on Computer Games, pp. 1–12 (2012)
Google Scholar
Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS, vol. 8736, pp. 285–296. Springer, Heidelberg (2014)
Google Scholar
Edelkamp, S., Gath, M.: Pickup-and-delivery problems with time windows and capacity constraints using nested Monte-Carlo search. In: ICAART (2014)
Google Scholar
Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: IEEE SSCI (2013)
Google Scholar
Gath, M., Herzog, O., Edelkamp, S.: Agent-based planning and control for groupage traffic. In: IEEE-CEWIT (2013)
Google Scholar
Huang, S.-C., Arneson, B., Hayward, R.B., Müller, M., Pawlewicz, J.: MoHex 2.0: a pattern-based MCTS hex player. In: Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 60–71. Springer, Heidelberg (2014)
Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Palombo, A., Stern, R., Puzis, R., Felner, A., Kiesel, S., Ruml, W.: Solving the snake in the box problem with heuristic search: first results. In: Proceedings of the Eighth Annual Symposium on Combinatorial Search, SOCS 2015, 11–13 June 2015, Ein Gedi, The Dead Sea, Israel, pp. 96–104 (2015)
Google Scholar
Rosin, C.D.: Nested rollout policy adaptation for Monte-Carlo tree search. In: IJCAI, pp. 649–654 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Mathematik und Informatik, Universität Bremen, Am Fallturm 1, 28359, Bremen, Germany
Stefan Edelkamp
PSL – Université Paris-Dauphine, LAMSADE UMR CNRS 7243, Place du Maréchal de Lattre de Tassigny, 75775, Paris Cedex 16, France
Tristan Cazenave

Authors

Stefan Edelkamp
View author publications
You can also search for this author in PubMed Google Scholar
Tristan Cazenave
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Edelkamp .

Editor information

Editors and Affiliations

Alpen-Adria Universität Klagenfurt, Klagenfurt, Austria
Gerhard Friedrich
University of Basel, Basel, Switzerland
Malte Helmert
Technische Universität Graz, Graz, Austria
Franz Wotawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edelkamp, S., Cazenave, T. (2016). Improved Diversity in Nested Rollout Policy Adaptation. In: Friedrich, G., Helmert, M., Wotawa, F. (eds) KI 2016: Advances in Artificial Intelligence. KI 2016. Lecture Notes in Computer Science(), vol 9904. Springer, Cham. https://doi.org/10.1007/978-3-319-46073-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-46073-4_4
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46072-7
Online ISBN: 978-3-319-46073-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics