Skip to main content
Log in

SLER: Self-generated long-term experience replay for continual reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks’ experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  2. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  3. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  4. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Article  Google Scholar 

  5. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350

    Article  Google Scholar 

  6. Ring MB (1997) Child: A first step towards continual learning. Mach Learn 28(1):77

    Article  Google Scholar 

  7. Thrun S (1995) A lifelong learning perspective for mobile robot control. In: Intelligent Robots and Systems. Elsevier, New York, pp 201–214

  8. Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, PascanuR., Hadsell R (2016) Progressive neural networks. arXiv:1606.04671

  9. Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Thirty-First AAAI conference on artificial intelligence

  10. Tessler C, Givony S, Zahavy T, Mankowitz DJ, Mannor S (2017) A deep hierarchical approach to lifelong learning in minecraft. In Thirty-First AAAI conference on artificial intelligence

  11. McClelland JL, McNaughton BL, O’reilly RC (1995) Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review 102(3):419

    Article  Google Scholar 

  12. Parisi GI, Kemker R, Part JL, Kanan C, Wermter S (2019) Continual lifelong learning with neural networks: A review. Neural Networks

  13. Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, et al (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences p 201611835

  14. Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), 139–154

  15. Schwarz J, Luketina J, Czarnecki WM, Grabska-Barwinska A, Teh YW, Pascanu R, Hadsell R (2018) Progress & compress: A scalable framework for continual learning. arXiv:1805.06370

  16. Isele D, Cosgun A (2018) Selective experience replay for lifelong learning. In Thirty-Second AAAI conference on artificial intelligence

  17. Lesort T, Caselles-Dupré H, Garcia-Ortiz M, Stoian A, Filliat D (2019) Generative models from the perspective of continual learning. In 2019 International Joint Conference on Neural Networks (IJCNN) IEEE 1–8

  18. Wu C, Herranz L, Liu X, van de Weijer J, Raducanu B, et al (2018) Memory replay gans: Learning to generate new categories without forgetting. In Advances in Neural Information Processing Systems 5962–5972

  19. Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, 2990–2999

  20. Tanaka F, Yamamura M (1997) An approach to lifelong reinforcement learning through multiple environments. In 6th European Workshop on Learning Robots, 93–99

  21. Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2015) Policy distillation. arXiv:1511.06295

  22. Maas AL, Hannun AY, NG AY (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, 30:3

  23. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.059521511.05952

  24. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929

    MathSciNet  MATH  Google Scholar 

  25. Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J, et al (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv:1708.04782

Download references

Acknowledgements

We thank our colleagues for their collaboration and the present work. We also thank all the reviewers for their specific comments and suggestions. This work is supported by the National key research and development plan 2018YFC0832300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinliang Zhao.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Li, Y., Zhao, Y. et al. SLER: Self-generated long-term experience replay for continual reinforcement learning. Appl Intell 51, 185–201 (2021). https://doi.org/10.1007/s10489-020-01786-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01786-1

Keywords