Policy Space Noise in Deep Deterministic Policy Gradient

Yan, Yan; Liu, Quan

doi:10.1007/978-3-030-04179-3_55

Yan Yan¹⁶ &
Quan Liu^16,17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11302))

Included in the following conference series:

International Conference on Neural Information Processing

2365 Accesses

Abstract

Deep deterministic policy gradient (DDPG) algorithm is an attractive reinforcement learning method, which directly optimizes the policy and has good performance in many continuous control tasks. In DDPG, the agent explores the environment by using the Gaussian white noise in the action space. In this paper, to further improve the efficiency of exploration, we inject the factorized Gaussian noise straightly to the policy space and propose a novel dithering perturbation way that can affect subsequent states in the future, resulting in more abundant trajectories. The noise is sampled from a Gaussian noise distribution and learned together with the network weight. In order to guarantee the effectiveness of the perturbation, the same perturbation ratio is used on all layers. Our method does not require augment the environment’s reward signal with additional intrinsic motivation term and the agent can learn directly from the environment. The proposed DDPG with exploratory noise in policy space was tested on the continuous control tasks. The experimental results demonstrate that it achieved better performance than the methods with no noise or action space noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-guided deep deterministic policy gradient with multi-actor

Article 03 March 2021

A novel investigation on the effects of state and reward structure in designing deep reinforcement learning-based controller for nonlinear dynamical systems

Article 26 March 2024

State Representation Learning for Minimax Deep Deterministic Policy Gradient

References

Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico (2016)
Google Scholar
Singh, S.P., Barto, A.G., et al.: Variational information maximisation for intrinsically motivated reinforcement learning. In: NIPS, pp. 2125–2133 (2015)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN. In: NIPS, pp. 4026–4034 (2016)
Google Scholar
Fix, J., Geist, M.: Monte-Carlo swarm policy search. In: ICAISC, pp. 75–83 (2012)
Google Scholar
Plappert, M., Houthooft, R., Dhariwal, P., et al.: Parameter space noise for exploration. In: Proceedings of the International Conference on Learning Representations, Vancouver, Canada (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Putta, S.R., Tulabandhula, T.: Pure exploration in episodic fixed-horizon markov decision processes. In: Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 1703–1704 (2017)
Google Scholar
Chou, P.-W., Maturana, D., Scherer, S.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML, pp. 834–843 (2017)
Google Scholar
Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31th International Conference on Machine Learning, pp. 387–395 (2014)
Google Scholar
Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, Boca Raton (2010)
Book Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Duan, Y., Chen, X., Houthooft, R., et al.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1329–1338 (2016)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3th International Conference for Learning Representations, San Diego, USA (2015)
Google Scholar

Download references

Acknowledgement

This work was funded by National Natural Science Foundation (61272005, 61303108, 61373094, 61502323, 61272005, 61303108, 61373094, 61472262). We would also like to thank the reviewers for their helpful comments. Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Yan Yan & Quan Liu
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 210000, China
Quan Liu
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China
Quan Liu

Authors

Yan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Quan Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan Liu .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yan, Y., Liu, Q. (2018). Policy Space Noise in Deep Deterministic Policy Gradient. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-04179-3_55
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04178-6
Online ISBN: 978-3-030-04179-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Policy Space Noise in Deep Deterministic Policy Gradient

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-guided deep deterministic policy gradient with multi-actor

A novel investigation on the effects of state and reward structure in designing deep reinforcement learning-based controller for nonlinear dynamical systems

State Representation Learning for Minimax Deep Deterministic Policy Gradient

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Policy Space Noise in Deep Deterministic Policy Gradient

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-guided deep deterministic policy gradient with multi-actor

A novel investigation on the effects of state and reward structure in designing deep reinforcement learning-based controller for nonlinear dynamical systems

State Representation Learning for Minimax Deep Deterministic Policy Gradient

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation