SLER: Self-generated long-term experience replay for continual reinforcement learning

Li, Chunmao; Li, Yang; Zhao, Yinliang; Peng, Peng; Geng, Xupeng

doi:10.1007/s10489-020-01786-1

SLER: Self-generated long-term experience replay for continual reinforcement learning

Published: 07 August 2020

Volume 51, pages 185–201, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chunmao Li¹,
Yang Li¹,
Yinliang Zhao ORCID: orcid.org/0000-0001-5155-2137¹,
Peng Peng² &
…
Xupeng Geng¹

879 Accesses
14 Citations
Explore all metrics

Abstract

Deep reinforcement learning has achieved significant success in various domains. However, it still faces a huge challenge when learning multiple tasks in sequence. This is because the interaction in a complex setting involves continual learning that results in the change in data distributions over time. A continual learning system should ensure that the agent acquires new knowledge without forgetting the previous one. However, catastrophic forgetting may occur as the new experience can overwrite previous experience due to limited memory size. The dual experience replay algorithm which retains previous experience is widely applied to reduce forgetting, but it cannot be applied in scalable tasks when the memory size is constrained. To alleviate the constrained by the memory size, we propose a new continual reinforcement learning algorithm called Self-generated Long-term Experience Replay (SLER). Our method is different from the standard dual experience replay algorithm, which uses short-term experience replay to retain current task experience, and the long-term experience replay retains all past tasks’ experience to achieve continual learning. In this paper, we first trained an environment sample model called Experience Replay Mode (ERM) to generate the simulated state sequence of the previous tasks for knowledge retention. Then combined the ERM with the experience of the new task to generate the simulation experience all previous tasks to alleviate forgetting. Our method can effectively decrease the requirement of memory size in multiple tasks, reinforcement learning. We show that our method in StarCraft II and the GridWorld environments performs better than the state-of-the-art deep learning method and achieve a comparable result to the dual experience replay method, which retains the experience of all the tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A General Unbiased Training Framework for Deep Reinforcement Learning

An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay

Balanced prioritized experience replay in off-policy reinforcement learning

Article 18 May 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT Press, Cambridge
MATH Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Article Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350
Article Google Scholar
Ring MB (1997) Child: A first step towards continual learning. Mach Learn 28(1):77
Article Google Scholar
Thrun S (1995) A lifelong learning perspective for mobile robot control. In: Intelligent Robots and Systems. Elsevier, New York, pp 201–214
Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, PascanuR., Hadsell R (2016) Progressive neural networks. arXiv:1606.04671
Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In Thirty-First AAAI conference on artificial intelligence
Tessler C, Givony S, Zahavy T, Mankowitz DJ, Mannor S (2017) A deep hierarchical approach to lifelong learning in minecraft. In Thirty-First AAAI conference on artificial intelligence
McClelland JL, McNaughton BL, O’reilly RC (1995) Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychological review 102(3):419
Article Google Scholar
Parisi GI, Kemker R, Part JL, Kanan C, Wermter S (2019) Continual lifelong learning with neural networks: A review. Neural Networks
Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, et al (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences p 201611835
Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T (2018) Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision (ECCV), 139–154
Schwarz J, Luketina J, Czarnecki WM, Grabska-Barwinska A, Teh YW, Pascanu R, Hadsell R (2018) Progress & compress: A scalable framework for continual learning. arXiv:1805.06370
Isele D, Cosgun A (2018) Selective experience replay for lifelong learning. In Thirty-Second AAAI conference on artificial intelligence
Lesort T, Caselles-Dupré H, Garcia-Ortiz M, Stoian A, Filliat D (2019) Generative models from the perspective of continual learning. In 2019 International Joint Conference on Neural Networks (IJCNN) IEEE 1–8
Wu C, Herranz L, Liu X, van de Weijer J, Raducanu B, et al (2018) Memory replay gans: Learning to generate new categories without forgetting. In Advances in Neural Information Processing Systems 5962–5972
Shin H, Lee JK, Kim J, Kim J (2017) Continual learning with deep generative replay. In Advances in Neural Information Processing Systems, 2990–2999
Tanaka F, Yamamura M (1997) An approach to lifelong reinforcement learning through multiple environments. In 6th European Workshop on Learning Robots, 93–99
Rusu AA, Colmenarejo SG, Gulcehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2015) Policy distillation. arXiv:1511.06295
Maas AL, Hannun AY, NG AY (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, 30:3
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952 1511.05952
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929
MathSciNet MATH Google Scholar
Vinyals O, Ewalds T, Bartunov S, Georgiev P, Vezhnevets AS, Yeo M, Makhzani A, Küttler H, Agapiou J, Schrittwieser J, et al (2017) Starcraft ii: A new challenge for reinforcement learning. arXiv:1708.04782

Download references

Acknowledgements

We thank our colleagues for their collaboration and the present work. We also thank all the reviewers for their specific comments and suggestions. This work is supported by the National key research and development plan 2018YFC0832300.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Xianjiaotong University, Xian, 710049, China
Chunmao Li, Yang Li, Yinliang Zhao & Xupeng Geng
Inspir.ai, Beijing, China
Peng Peng

Authors

Chunmao Li
View author publications
You can also search for this author inPubMed Google Scholar
Yang Li
View author publications
You can also search for this author inPubMed Google Scholar
Yinliang Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Peng Peng
View author publications
You can also search for this author inPubMed Google Scholar
Xupeng Geng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yinliang Zhao.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Li, Y., Zhao, Y. et al. SLER: Self-generated long-term experience replay for continual reinforcement learning. Appl Intell 51, 185–201 (2021). https://doi.org/10.1007/s10489-020-01786-1

Download citation

Published: 07 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10489-020-01786-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SLER: Self-generated long-term experience replay for continual reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A General Unbiased Training Framework for Deep Reinforcement Learning

An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay

Balanced prioritized experience replay in off-policy reinforcement learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now