Transfer Learning and Curriculum Learning in Sokoban

Yang, Zhao; Preuss, Mike; Plaat, Aske

doi:10.1007/978-3-030-93842-0_11

Zhao Yang¹⁰,
Mike Preuss¹⁰ &
Aske Plaat¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1530))

Included in the following conference series:

Benelux Conference on Artificial Intelligence

830 Accesses

Abstract

Transfer learning can speed up training in machine learning, and is regularly used in classification tasks. It reuses prior knowledge from other tasks to pre-train networks for new tasks. In reinforcement learning, learning actions for a behavior policy that can be applied to new environments is still a challenge, especially for tasks that involve much planning. Sokoban is a challenging puzzle game. It has been used widely as a benchmark in planning-based reinforcement learning. In this paper, we show how prior knowledge improves learning in Sokoban tasks. We find that reusing feature representations learned previously can accelerate learning new, more complex, instances. In effect, we show how curriculum learning, from simple to complex tasks, works in Sokoban. Furthermore, feature representations learned in simpler instances are more general, and thus lead to positive transfers towards more complex tasks, but not vice versa. We have also studied which part of the knowledge is most important for transfer to succeed, and identify which layers should be used for pre-training (Codes we used for this work can be found at https://github.com/yangzhao-666/TLCLS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Curriculum Generation by Hierarchical Reinforcement Learning

Modular Networks Prevent Catastrophic Interference in Model-Based Multi-task Reinforcement Learning

ELSIM: End-to-End Learning of Reusable Skills Through Intrinsic Motivation

References

Anderson, C.W., Lee, M., Elliott, D.L.: Faster reinforcement learning after pretraining deep networks to predict state dynamics. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2015)
Google Scholar
Badia, A.P., et al.: Agent57: Outperforming the Atari human benchmark. In: International Conference on Machine Learning, pp. 507–517. PMLR (2020)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Brys, T., Harutyunyan, A., Taylor, M.E., Nowé, A.: Policy transfer using reward shaping. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 181–188 (2015)
Google Scholar
Cook, M., Raad, A.: Hyperstate space graphs for automated game analysis. In: IEEE Conference on Games, CoG 2019, London, United Kingdom, 20–23 August 2019, pp. 1–8. IEEE (2019). https://doi.org/10.1109/CIG.2019.8848026
De la Cruz, G., Du, Y., Irwin, J., Taylor, M.: Initial progress in transfer for deep reinforcement learning algorithms. In: 25th International Joint Conference on Artificial Intelligence (IJCAI), vol. 7 (2016)
Google Scholar
Cruz, G.V., Jr., Du, Y., Taylor, M.E.: Pre-training neural networks with human demonstrations for deep reinforcement learning. arXiv preprint arXiv:1709.04083 (2017)
Culberson, J.: Sokoban is PSPACE-complete (1997)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dor, D., Zwick, U.: Sokoban and other motion planning problems. Comput. Geom. 13(4), 215–228 (1999)
Article MathSciNet Google Scholar
Feng, D., Gomes, C.P., Selman, B.: A novel automated curriculum strategy to solve hard Sokoban planning instances. Adv. Neural. Inf. Process. Syst. 33, 3141–3152 (2020)
Google Scholar
Feng, D., Gomes, C.P., Selman, B.: Solving hard AI planning instances using curriculum-driven deep reinforcement learning. CoRR abs/2006.02689 (2020). https://arxiv.org/abs/2006.02689
Fernández, F., García, J., Veloso, M.: Probabilistic policy reuse for inter-task transfer learning. Robot. Auton. Syst. 58(7), 866–871 (2010)
Article Google Scholar
Guez, A., et al.: An investigation of model-free planning. In: International Conference on Machine Learning, pp. 2464–2473. PMLR (2019)
Google Scholar
Guez, A., et al.: Learning to search with MCTSnets. In: International Conference on Machine Learning, pp. 1822–1831. PMLR (2018)
Google Scholar
Hamrick, J.B., et al.: On the role of planning in model-based deep reinforcement learning. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, 3–7 May 2021 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Google Scholar
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Google Scholar
Ontanón, S., Synnaeve, G., Uriarte, A., Richoux, F., Churchill, D., Preuss, M.: A survey of real-time strategy game AI research and competition in StarCraft. IEEE Trans. Comput. Intell. AI Games 5(4), 293–311 (2013)
Article Google Scholar
Plaat, A.: Learning to Play: Reinforcement Learning and Games. Springer, Heidelberg (2020). https://learningtoplay.net
Racanière, S., et al.: Imagination-augmented agents for deep reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5694–5705 (2017)
Google Scholar
Schrader, M.P.B.: Gym-Sokoban (2018). https://github.com/mpSchrader/gym-sokoban
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Spector, B., Belongie, S.: Sample-efficient reinforcement learning through transfer and architectural priors. arXiv preprint arXiv:1801.02268 (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning, An Introduction, 2nd edn. MIT Press, Cambridge (2018)
MATH Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
MathSciNet MATH Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Xu, W., He, J., Shu, Y.: Transfer learning and deep domain adaptation. In: Advances in Deep Learning. IntechOpen (2020)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)
Google Scholar

Download references

Acknowledgement

The financial support to Zhao Yang is from the China Scholarship Council (CSC). Computation support is from ALICE and DSLab. The authors thank Hui Wang, Matthias Müller-Brockhausen, Michiel van der Meer, Thomas Moerland and all members from the Leiden Reinforcement Learning Group for helpful discussions.

Author information

Authors and Affiliations

LIACS, Leiden University, Leiden, The Netherlands
Zhao Yang, Mike Preuss & Aske Plaat

Authors

Zhao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mike Preuss
View author publications
You can also search for this author in PubMed Google Scholar
Aske Plaat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhao Yang .

Editor information

Editors and Affiliations

University of Luxembourg, Esch-sur-Alzette, Luxembourg
Luis A. Leiva
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Cédric Pruski
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Réka Markovich
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Amro Najjar
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Christoph Schommer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Z., Preuss, M., Plaat, A. (2022). Transfer Learning and Curriculum Learning in Sokoban. In: Leiva, L.A., Pruski, C., Markovich, R., Najjar, A., Schommer, C. (eds) Artificial Intelligence and Machine Learning. BNAIC/Benelearn 2021. Communications in Computer and Information Science, vol 1530. Springer, Cham. https://doi.org/10.1007/978-3-030-93842-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-93842-0_11
Published: 11 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93841-3
Online ISBN: 978-3-030-93842-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics