Deep Recurrent Deterministic Policy Gradient for Physical Control

Zhang, Lei; Han, Shuai; Zhang, Zhiruo; Li, Lefan; Lü, Shuai

doi:10.1007/978-3-030-61616-8_21

Lei Zhang¹¹,
Shuai Han^12,13,
Zhiruo Zhang¹¹,
Lefan Li¹¹ &
…
Shuai Lü ORCID: orcid.org/0000-0002-8081-4498^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2228 Accesses
3 Citations

Abstract

The observable states play a significant role in Reinforcement Learning (RL), meanwhile, the performance of RL is strongly associated with the quality of inferred hidden states. It is a challenging task to accurately extract hidden states because they are often related to both environment’s and agent’s histories, and require numerous domain knowledge. In this work, we aim to leverage history information to improve the performance of agent. Firstly, we discuss that the neglect and usual process of history information are harmful to agent’s performance. Secondly, we propose a novel model that combines the advantage of both supervised learning and RL. Specifically, we extend the framework of classical policy gradient and propose to extract history information using recurrent neural networks. Thirdly, we evaluate our model in simulated physical control environments, outperforming the state-of-the-art models and performing obviously better on more challenging tasks. Finally, we analyze the reasons and suggest possible approaches to extend and scale up the model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code available at https://github.com/cheunglei/drd.

References

Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628 (2013)
Google Scholar
Brockman, G., et al.: OpenAI gym (2016). http://arxiv.org/abs/1606.01540
Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. In: International Conference on Machine Learning, pp. 1329–1338 (2016)
Google Scholar
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M.G., Pineau, J., et al.: An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 11(3–4), 219–354 (2018)
Article Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1582–1591 (2018)
Google Scholar
Garnier, P., Viquerat, J., Rabault, J., Larcher, A., Kuhnle, A., Hachem, E.: A review on deep reinforcement learning for fluid mechanics (2019). http://arxiv.org/abs/1908.04127
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1856–1865 (2018)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI Conference on Artificial Intelligence, pp. 2094–2100 (2016)
Google Scholar
Hausknecht, M.J., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposia, pp. 29–37 (2015). http://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kapturowski, S., Ostrovski, G., Quan, J., Munos, R., Dabney, W.: Recurrent experience replay in distributed reinforcement learning. In: International Conference on Learning Representations (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Li, X., et al.: Recurrent reinforcement learning: a hybrid approach (2015). http://arxiv.org/abs/1509.03044
Li, Y., Zheng, W., Zheng, Z.: Deep robust reinforcement learning for practical algorithmic trading. IEEE Access 7, 108014–108022 (2019)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). http://arxiv.org/abs/1707.06347
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Wang, J.X., et al.: Learning to reinforcement learn (2016). http://arxiv.org/abs/1611.05763
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 (2016)
Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, pp. 5279–5288 (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61300049, 61763003; the National Key R&D Program of China under Grant No. 2017YFB1003103; and the Natural Science Research Foundation of Jilin Province of China under Grant Nos. 20180101053JC, 20190201193JC.

Author information

Authors and Affiliations

College of Software, Jilin University, Changchun, 130012, China
Lei Zhang, Zhiruo Zhang & Lefan Li
College of Computer Science and Technology, Jilin University, Changchun, 130012, China
Shuai Han & Shuai Lü
Key Laboratory of Symbolic Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun, 130012, China
Shuai Han & Shuai Lü

Authors

Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Han
View author publications
You can also search for this author in PubMed Google Scholar
Zhiruo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lefan Li
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Lü
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuai Lü .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Han, S., Zhang, Z., Li, L., Lü, S. (2020). Deep Recurrent Deterministic Policy Gradient for Physical Control. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_21
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics