Study on LSTM and ConvLSTM Memory-Based Deep Reinforcement Learning

Duarte, Fernando Fradique; Lau, Nuno; Pereira, Artur; Reis, Luís Paulo

doi:10.1007/978-3-031-55326-4_11

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14546))

Included in the following conference series:

International Conference on Agents and Artificial Intelligence

49 Accesses

Abstract

Memory-based Deep Reinforcement Learning (DRL) has been successfully applied to solve vision-based control tasks from high-dimensional sensory data. While most of this work leverages the Long Short-Term Memory (LSTM) as the memory module of the agent, recent developments have revisited and extended the original formulation of the LSTM. Some of these developments include the ConvLSTM, a convolutional-based implementation of the LSTM, the MDN-RNN, the combination of a Mixture Density Network with an LSTM and the GridLSTM, a multidimensional grid of LSTM cells. It seems however unclear how these different memory modules compare to each other in terms of agent performance, when applied in the context of DRL. This work aims to perform a comparative study of several memory-based DRL agents, based on the LSTM, ConvLSTM, MDN-RNN and GridLSTM memory modules. The results obtained seem to support the claim that in some cases these more recent memory modules can improve the performance of the agent, to varying degrees, when compared to a baseline agent based on an LSTM. The experimental results were validated in the Atari 2600 videogame platform.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. In: AAAI Fall Symposium - Technical Report, AI Access Foundation, pp. 29–37 (2015)
Google Scholar
Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv:1512.04455, Preprint (2015)
Sorokin, I., Seleznev, A., Pavlov, M., Fedorov, A., Ignateva, A.: Deep attention recurrent Q-network. http://arxiv.org/abs/1512.01693, Preprint (2015)
Tang, Y., Nguyen, D., Ha, D.: Neuroevolution of self-interpretable agents. In: GECCO 2020: Genetic and Evolutionary Computation Conference, pp. 414–424. ACM, Cancún Mexico (2020)
Google Scholar
Mott, A., Zoran, D., Chrzanowski, M., Wierstra, D., Rezende, D.J.: Towards interpretable reinforcement learning using attention augmented agents. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 12329–12338 (2019)
Google Scholar
Ha, D., Schmidhuber, J.: World models. http://arxiv.org/abs/1803.10122, Preprint (2018)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, Montréal, Canada, pp. 2455–2467 (2018)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Shi, X., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, Quebec, Canada, pp. 802–810 (2015)
Google Scholar
Bishop, C.M.: Mixture Density Networks (1994)
Google Scholar
Kalchbrenner, N., Danihelka, I., Graves, A.: Grid long short-term memory. In: 4th International Conference on Learning Representations, ICLR (2016)
Google Scholar
Duarte, F.F., Lau, N., Pereira, A., Reis, L.P.: LSTM, ConvLSTM, MDN-RNN and GridLSTM memory-based deep reinforcement learning. In: Proceedings of the 15th International Conference on Agents and Artificial Intelligence, ICAART 2023, Lisbon, Portugal, pp. 169–179. SCITEPRESS (2023)
Google Scholar
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, pp. 1787–1796. PMLR (2018)
Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. http://arxiv.org/abs/1410.5401, Preprint (2014)
Graves, A., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. http://arxiv.org/abs/1308.0850, Preprint (2013)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, Quebec, Canada, pp. 3104–3112 (2014)
Google Scholar
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, pp. 843–852. JMLR (2015)
Google Scholar
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, pp. 551–561. The Association for Computational Linguistics (2016)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, pp. 8024–8035 (2019)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, pp. 6645–6649. IEEE (2013)
Google Scholar
Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. https://arxiv.org/abs/1607.06450, Preprint (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, pp. 448–456. JMLR (2015)
Google Scholar
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M.J., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Artif. Intell. 61, 523–562 (2018)
MathSciNet Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, pp. 1928–1937. JMLR (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA (2015)
Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico (2016)
Google Scholar
Brockman, G., et al.: OpenAI Gym. CoRR (2016)
Google Scholar

Download references

Acknowledgements

This research was funded by Fundação para a Ciência e a Tecnologia, grant number SFRH/BD/145723 /2019 - UID/CEC/00127/2019.

Author information

Authors and Affiliations

Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Aveiro, Portugal
Fernando Fradique Duarte
Department of Electronics, Telecommunications and Informatics, University of Aveiro, Aveiro, Portugal
Nuno Lau & Artur Pereira
Faculty of Engineering, Department of Informatics Engineering, University of Porto, Porto, Portugal
Luís Paulo Reis

Authors

Fernando Fradique Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Lau
View author publications
You can also search for this author in PubMed Google Scholar
Artur Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Luís Paulo Reis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Fradique Duarte .

Editor information

Editors and Affiliations

LIACC, University of Porto, Porto, Portugal
Ana Paula Rocha
ICREA, Institute of Evolutionary Biology, Barcelona, Spain
Luc Steels
Leiden Institute of Advanced Computer Science, Leiden, The Netherlands
Jaap van den Herik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duarte, F.F., Lau, N., Pereira, A., Reis, L.P. (2024). Study on LSTM and ConvLSTM Memory-Based Deep Reinforcement Learning. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2023. Lecture Notes in Computer Science(), vol 14546. Springer, Cham. https://doi.org/10.1007/978-3-031-55326-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-55326-4_11
Published: 15 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55325-7
Online ISBN: 978-3-031-55326-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics