Guided Reinforcement Learning via Sequence Learning

Ramamurthy, Rajkumar; Sifa, Rafet; Lübbering, Max; Bauckhage, Christian

doi:10.1007/978-3-030-61616-8_27

Rajkumar Ramamurthy¹¹,
Rafet Sifa¹¹,
Max Lübbering¹¹ &
…
Christian Bauckhage¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2215 Accesses

Abstract

Applications of Reinforcement Learning (RL) suffer from high sample complexity due to sparse reward signals and inadequate exploration. Novelty Search (NS) guides as an auxiliary task, in this regard to encourage exploration towards unseen behaviors. However, NS suffers from critical drawbacks concerning scalability and generalizability since they are based off instance learning. Addressing these challenges, we previously proposed a generic approach using unsupervised learning to learn representations of agent behaviors and use reconstruction losses as novelty scores. However, it considered only fixed-length sequences and did not utilize sequential information of behaviors. Therefore, we here extend this approach by using sequential auto-encoders to incorporate sequential dependencies. Experimental results on benchmark tasks show that this sequence learning aids exploration outperforming previous novelty search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying Count-Based Exploration and Intrinsic Motivation. arXiv preprint arXiv:1606.01868 (2016)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems, pp. 4299–4307 (2017)
Google Scholar
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. arXiv preprint arXiv:1712.06560 (2017)
Du, Y., Czarnecki, W.M., Jayakumar, S.M., Pascanu, R., Lakshminarayanan, B.: Adapting Auxiliary Losses using Gradient Similarity. arXiv preprint arXiv:1812.02224 (2018)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of International Conference on Robotics and Automation (2017)
Google Scholar
Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv preprint arXiv:1611.05397 (2016)
Kartal, B., Hernandez-Leal, P., Taylor, M.E.: Terminal prediction as an auxiliary task for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (2019)
Google Scholar
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)
Article Google Scholar
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of International Conference on Genetic and Evolutionary Computation (2011)
Google Scholar
Makhzani, A., Frey, B.: K-sparse autoencoders. arXiv preprint arXiv:1312.5663 (2013)
Mirowski, P., et al.: Learning to Navigate in Complex Environments. arXiv preprint arXiv:1611.03673 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-Based Exploration with Neural Density Models (2017)
Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning (2017)
Google Scholar
Pathak, D., Gandhi, D., Gupta, A.: Self-Supervised Exploration via Disagreement. arXiv preprint arXiv:1906.04161 (2019)
Ramamurthy, R.: Pytorch-Optimize - A Black Box Optimization Framework. https://github.com/rajcscw/pytorch-optimize (2020)
Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S.: Leveraging domain knowledge for reinforcement learning using MMC architectures. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 595–607. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_48
Chapter Google Scholar
Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1
Chapter Google Scholar
Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C.: Novelty-guided reinforcement learning via encoded behaviors. In: Proceedings of International Joint Conference on Neural Networks (2020)
Google Scholar
Rechenberg, I.: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Ph.D. thesis, Technical University of Berlin, Department of Process Engineering (1971)
Google Scholar
Rechenberg, I.: Evolutionsstrategien. In: Simulationsmethoden in der Medizin und Biologie (1978)
Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 (2017)
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Mental Dev. 2, 230–247 (2010)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of International Conference on Machine Learning (2015)
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of International Conference on Machine Learning (2014)
Google Scholar
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Google Scholar
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of International Conference on Machine Learning (2015)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings in Neural Information Processing Systems (2014)
Google Scholar
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of International Conference on Robotics and Automation (2017)
Google Scholar

Download references

Acknowledgement

In parts, the authors of this work were supported by the Fraunhofer Research Center for Machine Learning (RCML) within the Fraunhofer Cluster of Excellence Cognitive Internet Technologies (CCIT). We gratefully acknowledges this support.

Author information

Authors and Affiliations

Fraunhofer IAIS, Sankt Augustin, Germany
Rajkumar Ramamurthy, Rafet Sifa, Max Lübbering & Christian Bauckhage

Authors

Rajkumar Ramamurthy
View author publications
You can also search for this author in PubMed Google Scholar
Rafet Sifa
View author publications
You can also search for this author in PubMed Google Scholar
Max Lübbering
View author publications
You can also search for this author in PubMed Google Scholar
Christian Bauckhage
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajkumar Ramamurthy .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C. (2020). Guided Reinforcement Learning via Sequence Learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_27
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics