Skip to main content

Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14115))

Included in the following conference series:

  • 344 Accesses

Abstract

Learning a well-informed heuristic function for hard planning domains is an elusive problem. Although there are known neural network architectures to represent such heuristic knowledge, it is not obvious what concrete information is learned and whether techniques aimed at understanding the structure help in improving the quality of the heuristics. This paper presents a network model that learns a heuristic function capable of relating distant parts of the state space via optimal plan imitation using the attention mechanism which drastically improves the learning of a good heuristic function. The learning of this heuristic function is further improved by the use of curriculum learning, where newly solved problem instances are added to the training set, which, in turn, helps to solve problems of higher complexities and train from harder problem instances. The methodologies used in this paper far exceed the performances of all existing baselines including known deep learning approaches and classical planning heuristics. We demonstrate its effectiveness and success on grid-type PDDL domains, namely Sokoban, maze-with-teleports and sliding tile puzzles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The composability of harmonic functions is based on the following property \(\cos (\theta _1 + \theta _2) = \cos (\theta _1)\cos (\theta _2) - \sin (\theta _1)\sin (\theta _2) = (\cos (\theta _1), \sin (\theta _1)) \cdot (\sin (\theta _1), \sin (\theta _2)),\) where \(\cdot \) denotes the inner product of two vectors, which appears in Eq. (1) in inner product of \({\textbf {q}}_{u,v}\) and \({\textbf {k}}_{r,s}\).

  2. 2.

    Convolution layers are appropriately padded to preserve sizes.

  3. 3.

    https://github.com/ravenkls/Maze-Generator-and-Solver.

  4. 4.

    https://github.com/levilelis/h-levin/.

  5. 5.

    https://github.com/YahyaAlaaMassoud/Sliding-Puzzle-A-Star-Solver.

  6. 6.

    Available at https://github.com/deepmind/boxoban-levels.

  7. 7.

    https://github.com/deepmind/boxoban-levels/blob/master/unfiltered/test/000.txt.

  8. 8.

    The planners and NNs were given 10 minutes to solve each maze instance.

  9. 9.

    The planners and NNs were given 10 minutes to solve each maze instance.

References

  1. Agostinelli, F., McAleer, S., Shmakov, A., Baldi, P.: Solving the rubik’s cube with deep reinforcement learning and search. Nature Mach. Intell. 1(8), 356–363 (2019)

    Article  Google Scholar 

  2. Asai, M., Fukunaga, A.: Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. arXiv preprint arXiv:1705.00154 (2017)

  3. Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001)

    Google Scholar 

  4. Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)

    Article  Google Scholar 

  5. Ernandes, M., Gori, M.: Likely-admissible and sub-symbolic heuristics. In: Proceedings of the 16th European Conference on Artificial Intelligence, pp. 613–617 (2004)

    Google Scholar 

  6. Fikes, R.E., Nilsson, N.J.: Strips: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3–4), 189–208 (1971)

    Article  Google Scholar 

  7. Fox, M., Long, D.: Pddl2. 1: An extension to pddl for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003)

    Google Scholar 

  8. Groshev, E., Goldstein, M., Tamar, A., Srivastava, S., Abbeel, P.: Learning generalized reactive policies using deep neural networks. arXiv:1708.07280 (2017)

  9. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)

    Article  Google Scholar 

  10. Katz, M., Hoffmann, J.: Mercury planner: Pushing the limits of partial delete relaxation. In: IPC 2014 Planner Abstracts, pp. 43–47 (2014)

    Google Scholar 

  11. Katz, M., Sohrabi, S., Samulowitz, H., Sievers, S.: Delfi: Online planner selection for cost-optimal planning. In: IPC-9 Planner Abstracts, pp. 57–64 (2018)

    Google Scholar 

  12. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)

  13. Long, D., Fox, M.: Automatic synthesis and use of generic types in planning. In: AAAI, pp. 196–205. AAAI Press (2000)

    Google Scholar 

  14. Racanière, S., Weber, T., Reichert, D., Buesing, L., Guez, A., Jimenez Rezende, D., Puigdomènech Badia, A., Vinyals, O., Heess, N., Li, Y., et al.: Imagination-augmented agents for deep reinforcement learning. Adv. Neural. Inf. Process. Syst. 30, 5690–5701 (2017)

    Google Scholar 

  15. Richter, S., Westphal, M.: The lama planner: Guiding cost-based anytime planning with landmarks. J. Artif. Intell. Res. 39, 127–177 (2010)

    Article  Google Scholar 

  16. Schaal, S.: Is imitation learning the route to humanoid robots? Trends Cogn. Sci. 3(6), 233–242 (1999)

    Article  Google Scholar 

  17. Schrader, M.P.B.: gym-sokoban. github.com/mpSchrader/gym-sokoban (2018)

    Google Scholar 

  18. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Google Scholar 

  19. Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)

    Article  Google Scholar 

  20. Thrun, S.: Learning to play the game of chess. Adv. Neural. Inf. Process. Syst. 7, 1069–1076 (1994)

    Google Scholar 

  21. Torralba, A., Alcázar, V., Borrajo, D., Kissmann, P., Edelkamp, S.: Symba*: A symbolic bidirectional a* planner. In: International Planning Competition, pp. 105–108 (2014)

    Google Scholar 

  22. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning and advice taking. In: European Conference on Machine Learning, pp. 425–436. Springer (2006)

    Google Scholar 

  23. Tsai, Y.H.H., Bai, S., Yamada, M., Morency, L.P., Salakhutdinov, R.: Transformer dissection: An unified understanding for transformer’s attention via the lens of kernel. arXiv:1908.11775 (2019)

  24. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł, Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)

    Google Scholar 

  25. Virseda, J., Borrajo, D., Alcázar, V.: Learning heuristic functions for cost-based planning. Plan. Learn. 6 (2013)

    Google Scholar 

  26. Yoon, S.W., Fern, A., Givan, R.: Inductive policy selection for first-order mdps. arXiv preprint arXiv:1301.0614 (2012)

Download references

Acknowledgments

This work has been supported by project numbers 22-32620S and 22-30043S from Czech Science Foundation and OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leah Chrestien .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chrestien, L., Pevný, T., Edelkamp, S., Komenda, A. (2023). Heuristic Search Optimisation Using Planning and Curriculum Learning Techniques. In: Moniz, N., Vale, Z., Cascalho, J., Silva, C., Sebastião, R. (eds) Progress in Artificial Intelligence. EPIA 2023. Lecture Notes in Computer Science(), vol 14115. Springer, Cham. https://doi.org/10.1007/978-3-031-49008-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49008-8_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49007-1

  • Online ISBN: 978-3-031-49008-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics