Abstract
Learning a well-informed heuristic function for hard planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model that learns a heuristic function capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism which drastically improves the learning of a good heuristic function. The learning of this heuristic function is further improved by the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and train from harder problem instances. The methodologies used in this paper far exceed the performances of all existing baselines including known deep learning approaches and classical planning heuristics. We demonstrate its effectiveness and success on grid-type PDDL domains, namely Sokoban, maze-with-teleports and sliding tile puzzles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The composability of harmonic functions is based on the following property \(\cos (\theta _1 + \theta _2) = \cos (\theta _1)\cos (\theta _2) - \sin (\theta _1)\sin (\theta _2) = (\cos (\theta _1), \sin (\theta _1)) \cdot (\sin (\theta _1), \sin (\theta _2)),\) where \(\cdot \) denotes the inner product of two vectors, which appears in Eq. (1) in inner product of \({\textbf {q}}_{u,v}\) and \({\textbf {k}}_{r,s}\).
- 2.
Convolution layers are appropriately padded to preserve sizes.
- 3.
- 4.
- 5.
- 6.
Available at https://github.com/deepmind/boxoban-levels.
- 7.
- 8.
The planners and NNs were given 10 minutes to solve each maze instance.
- 9.
The planners and NNs were given 10 minutes to solve each maze instance.
References
Agostinelli, F., McAleer, S., Shmakov, A., Baldi, P.: Solving the rubik’s cube with deep reinforcement learning and search. Nature Mach. Intell. 1(8), 356–363 (2019)
Asai, M., Fukunaga, A.: Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. arXiv preprint arXiv:1705.00154 (2017)
Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Ernandes, M., Gori, M.: Likely-admissible and sub-symbolic heuristics. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 613–617 (2004)
Fikes, R.E., Nilsson, N.J.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)
Fox, M., Long, D.: Pddl2. 1: An extension to pddl for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003)
Groshev, E., Goldstein, M., Tamar, A., Srivastava, S., Abbeel, P.: Learning generalized reactive policies using deep neural networks. arXiv:1708.07280 (2017)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
Katz, M., Hoffmann, J.: Mercury planner: Pushing the limits of partial delete relaxation. In: IPC 2014 Planner Abstracts, pp. 43–47 (2014)
Katz, M., Sohrabi, S., Samulowitz, H., Sievers, S.: Delfi: Online planner selection for cost-optimal planning. In: IPC-9 Planner Abstracts, pp. 57–64 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Long, D., Fox, M.: Automatic synthesis and use of generic types in planning. In: AAAI, pp. 196–205. AAAI Press (2000)
Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Jimenez Rezende, D., Puigdomènech Badia, A., Vinyals, O., Heess, N., Li, Y., et al.: Imagination-augmented agents for deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 30, 5690–5701 (2017)
Richter, S., Westphal, M.: The lama planner: Guiding cost-based anytime planning with landmarks. J. Artif. Intell. Res. 39, 127–177 (2010)
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)
Schrader, M.P.B.: gym-sokoban. github.com/mpSchrader/gym-sokoban (2018)
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)
Thrun, S.: Learning to play the game of chess. Adv. Neural. Inf. Process. Syst. 7, 1069–1076 (1994)
Torralba, A., Alcázar, V., Borrajo, D., Kissmann, P., Edelkamp, S.: Symba*: A symbolic bidirectional a* planner. In: International Planning Competition, pp. 105–108 (2014)
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: European Conference on Machine Learning, pp. 425–436. Springer (2006)
Tsai, Y.H.H., Bai, S., Yamada, M., Morency, L.P., Salakhutdinov, R.: Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. arXiv:1908.11775 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Virseda, J., Borrajo, D., Alcázar, V.: Learning heuristic functions for cost-based planning. Plan. Learn. 6 (2013)
Yoon, S.W., Fern, A., Givan, R.: Inductive policy selection for first-order mdps. arXiv preprint arXiv:1301.0614 (2012)
Acknowledgments
This work has been supported by project numbers 22-32620S and 22-30043S from Czech Science Foundation and OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chrestien, L., Pevný, T., Edelkamp, S., Komenda, A. (2023). Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14115. Springer, Cham. https://doi.org/10.1007/978-3-031-49008-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-49008-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49007-1
Online ISBN: 978-3-031-49008-8
eBook Packages: Computer ScienceComputer Science (R0)