Skip to main content

Guided Reinforcement Learning via Sequence Learning

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2020 (ICANN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

  • 2215 Accesses

Abstract

Applications of Reinforcement Learning (RL) suffer from high sample complexity due to sparse reward signals and inadequate exploration. Novelty Search (NS) guides as an auxiliary task, in this regard to encourage exploration towards unseen behaviors. However, NS suffers from critical drawbacks concerning scalability and generalizability since they are based off instance learning. Addressing these challenges, we previously proposed a generic approach using unsupervised learning to learn representations of agent behaviors and use reconstruction losses as novelty scores. However, it considered only fixed-length sequences and did not utilize sequential information of behaviors. Therefore, we here extend this approach by using sequential auto-encoders to incorporate sequential dependencies. Experimental results on benchmark tasks show that this sequence learning aids exploration outperforming previous novelty search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying Count-Based Exploration and Intrinsic Motivation. arXiv preprint arXiv:1606.01868 (2016)

  2. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  3. Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems, pp. 4299–4307 (2017)

    Google Scholar 

  4. Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. arXiv preprint arXiv:1712.06560 (2017)

  5. Du, Y., Czarnecki, W.M., Jayakumar, S.M., Pascanu, R., Lakshminarayanan, B.: Adapting Auxiliary Losses using Gradient Similarity. arXiv preprint arXiv:1812.02224 (2018)

  6. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of International Conference on Robotics and Automation (2017)

    Google Scholar 

  7. Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv preprint arXiv:1611.05397 (2016)

  8. Kartal, B., Hernandez-Leal, P., Taylor, M.E.: Terminal prediction as an auxiliary task for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (2019)

    Google Scholar 

  9. Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)

    Article  Google Scholar 

  10. Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of International Conference on Genetic and Evolutionary Computation (2011)

    Google Scholar 

  11. Makhzani, A., Frey, B.: K-sparse autoencoders. arXiv preprint arXiv:1312.5663 (2013)

  12. Mirowski, P., et al.: Learning to Navigate in Complex Environments. arXiv preprint arXiv:1611.03673 (2016)

  13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  14. Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-Based Exploration with Neural Density Models (2017)

    Google Scholar 

  15. Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning (2017)

    Google Scholar 

  16. Pathak, D., Gandhi, D., Gupta, A.: Self-Supervised Exploration via Disagreement. arXiv preprint arXiv:1906.04161 (2019)

  17. Ramamurthy, R.: Pytorch-Optimize - A Black Box Optimization Framework. https://github.com/rajcscw/pytorch-optimize (2020)

  18. Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S.: Leveraging domain knowledge for reinforcement learning using MMC architectures. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 595–607. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_48

    Chapter  Google Scholar 

  19. Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1

    Chapter  Google Scholar 

  20. Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C.: Novelty-guided reinforcement learning via encoded behaviors. In: Proceedings of International Joint Conference on Neural Networks (2020)

    Google Scholar 

  21. Rechenberg, I.: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Ph.D. thesis, Technical University of Berlin, Department of Process Engineering (1971)

    Google Scholar 

  22. Rechenberg, I.: Evolutionsstrategien. In: Simulationsmethoden in der Medizin und Biologie (1978)

    Google Scholar 

  23. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 (2017)

  24. Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Mental Dev. 2, 230–247 (2010)

    Google Scholar 

  25. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of International Conference on Machine Learning (2015)

    Google Scholar 

  26. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of International Conference on Machine Learning (2014)

    Google Scholar 

  27. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Google Scholar 

  28. Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of International Conference on Machine Learning (2015)

    Google Scholar 

  29. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings in Neural Information Processing Systems (2014)

    Google Scholar 

  30. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of International Conference on Robotics and Automation (2017)

    Google Scholar 

Download references

Acknowledgement

In parts, the authors of this work were supported by the Fraunhofer Research Center for Machine Learning (RCML) within the Fraunhofer Cluster of Excellence Cognitive Internet Technologies (CCIT). We gratefully acknowledges this support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajkumar Ramamurthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C. (2020). Guided Reinforcement Learning via Sequence Learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-61616-8_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-61615-1

  • Online ISBN: 978-3-030-61616-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics