Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

Chrestien, Leah; Pevný, Tomás̆; Edelkamp, Stefan; Komenda, Antonín

doi:10.1007/978-3-031-49008-8_39

Leah Chrestien¹²,
Tomás̆ Pevný¹²,
Stefan Edelkamp¹² &
…
Antonín Komenda¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14115))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

344 Accesses

Abstract

Learning a well-informed heuristic function for hard planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model that learns a heuristic function capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism which drastically improves the learning of a good heuristic function. The learning of this heuristic function is further improved by the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and train from harder problem instances. The methodologies used in this paper far exceed the performances of all existing baselines including known deep learning approaches and classical planning heuristics. We demonstrate its effectiveness and success on grid-type PDDL domains, namely Sokoban, maze-with-teleports and sliding tile puzzles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The composability of harmonic functions is based on the following property \(\cos (\theta _1 + \theta _2) = \cos (\theta _1)\cos (\theta _2) - \sin (\theta _1)\sin (\theta _2) = (\cos (\theta _1), \sin (\theta _1)) \cdot (\sin (\theta _1), \sin (\theta _2)),\) where \(\cdot \) denotes the inner product of two vectors, which appears in Eq. (1) in inner product of \({\textbf {q}}_{u,v}\) and \({\textbf {k}}_{r,s}\).
2.
Convolution layers are appropriately padded to preserve sizes.
3.
https://github.com/ravenkls/Maze-Generator-and-Solver.
4.
https://github.com/levilelis/h-levin/.
5.
https://github.com/YahyaAlaaMassoud/Sliding-Puzzle-A-Star-Solver.
6.
Available at https://github.com/deepmind/boxoban-levels.
7.
https://github.com/deepmind/boxoban-levels/blob/master/unfiltered/test/000.txt.
8.
The planners and NNs were given 10 minutes to solve each maze instance.
9.
The planners and NNs were given 10 minutes to solve each maze instance.

References

Agostinelli, F., McAleer, S., Shmakov, A., Baldi, P.: Solving the rubik’s cube with deep reinforcement learning and search. Nature Mach. Intell. 1(8), 356–363 (2019)
Article Google Scholar
Asai, M., Fukunaga, A.: Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. arXiv preprint arXiv:1705.00154 (2017)
Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)
Google Scholar
Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)
Article Google Scholar
Ernandes, M., Gori, M.: Likely-admissible and sub-symbolic heuristics. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 613–617 (2004)
Google Scholar
Fikes, R.E., Nilsson, N.J.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)
Article Google Scholar
Fox, M., Long, D.: Pddl2. 1: An extension to pddl for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003)
Google Scholar
Groshev, E., Goldstein, M., Tamar, A., Srivastava, S., Abbeel, P.: Learning generalized reactive policies using deep neural networks. arXiv:1708.07280 (2017)
Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
Article Google Scholar
Katz, M., Hoffmann, J.: Mercury planner: Pushing the limits of partial delete relaxation. In: IPC 2014 Planner Abstracts, pp. 43–47 (2014)
Google Scholar
Katz, M., Sohrabi, S., Samulowitz, H., Sievers, S.: Delfi: Online planner selection for cost-optimal planning. In: IPC-9 Planner Abstracts, pp. 57–64 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Long, D., Fox, M.: Automatic synthesis and use of generic types in planning. In: AAAI, pp. 196–205. AAAI Press (2000)
Google Scholar
Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Jimenez Rezende, D., Puigdomènech Badia, A., Vinyals, O., Heess, N., Li, Y., et al.: Imagination-augmented agents for deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 30, 5690–5701 (2017)
Google Scholar
Richter, S., Westphal, M.: The lama planner: Guiding cost-based anytime planning with landmarks. J. Artif. Intell. Res. 39, 127–177 (2010)
Article Google Scholar
Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)
Article Google Scholar
Schrader, M.P.B.: gym-sokoban. github.com/mpSchrader/gym-sokoban (2018)
Google Scholar
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Google Scholar
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)
Article Google Scholar
Thrun, S.: Learning to play the game of chess. Adv. Neural. Inf. Process. Syst. 7, 1069–1076 (1994)
Google Scholar
Torralba, A., Alcázar, V., Borrajo, D., Kissmann, P., Edelkamp, S.: Symba*: A symbolic bidirectional a* planner. In: International Planning Competition, pp. 105–108 (2014)
Google Scholar
Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: European Conference on Machine Learning, pp. 425–436. Springer (2006)
Google Scholar
Tsai, Y.H.H., Bai, S., Yamada, M., Morency, L.P., Salakhutdinov, R.: Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. arXiv:1908.11775 (2019)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Google Scholar
Virseda, J., Borrajo, D., Alcázar, V.: Learning heuristic functions for cost-based planning. Plan. Learn. 6 (2013)
Google Scholar
Yoon, S.W., Fern, A., Givan, R.: Inductive policy selection for first-order mdps. arXiv preprint arXiv:1301.0614 (2012)

Download references

Acknowledgments

This work has been supported by project numbers 22-32620S and 22-30043S from Czech Science Foundation and OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics”.

Author information

Authors and Affiliations

Czech Technical University in Prague, Jugoslávských partyzánů 3, 160 00, Praha, Czech Republic
Leah Chrestien, Tomás̆ Pevný, Stefan Edelkamp & Antonín Komenda

Authors

Leah Chrestien
View author publications
You can also search for this author in PubMed Google Scholar
Tomás̆ Pevný
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Edelkamp
View author publications
You can also search for this author in PubMed Google Scholar
Antonín Komenda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leah Chrestien .

Editor information

Editors and Affiliations

Lucy Family Institute for Data & Society, Notre Dame, IN, USA
Nuno Moniz
GECAD, Polytechnic of Porto, Porto, Portugal
Zita Vale
GRIA - LIACC, University of Azores, Ponta-Delgada, Portugal
José Cascalho
CISUC, University of Coimbra, Coimbra, Portugal
Catarina Silva
IEETA, University of Aveiro, Aveiro, Portugal
Raquel Sebastião

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chrestien, L., Pevný, T., Edelkamp, S., Komenda, A. (2023). Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14115. Springer, Cham. https://doi.org/10.1007/978-3-031-49008-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-49008-8_39
Published: 15 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49007-1
Online ISBN: 978-3-031-49008-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics