Skip to main content

Policy Space Noise in Deep Deterministic Policy Gradient

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11302))

Included in the following conference series:

  • 2365 Accesses

Abstract

Deep deterministic policy gradient (DDPG) algorithm is an attractive reinforcement learning method, which directly optimizes the policy and has good performance in many continuous control tasks. In DDPG, the agent explores the environment by using the Gaussian white noise in the action space. In this paper, to further improve the efficiency of exploration, we inject the factorized Gaussian noise straightly to the policy space and propose a novel dithering perturbation way that can affect subsequent states in the future, resulting in more abundant trajectories. The noise is sampled from a Gaussian noise distribution and learned together with the network weight. In order to guarantee the effectiveness of the perturbation, the same perturbation ratio is used on all layers. Our method does not require augment the environment’s reward signal with additional intrinsic motivation term and the agent can learn directly from the environment. The proposed DDPG with exploratory noise in policy space was tested on the continuous control tasks. The experimental results demonstrate that it achieved better performance than the methods with no noise or action space noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico (2016)

    Google Scholar 

  2. Singh, S.P., Barto, A.G., et al.: Variational information maximisation for intrinsically motivated reinforcement learning. In: NIPS, pp. 2125–2133 (2015)

    Google Scholar 

  3. Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN. In: NIPS, pp. 4026–4034 (2016)

    Google Scholar 

  4. Fix, J., Geist, M.: Monte-Carlo swarm policy search. In: ICAISC, pp. 75–83 (2012)

    Google Scholar 

  5. Plappert, M., Houthooft, R., Dhariwal, P., et al.: Parameter space noise for exploration. In: Proceedings of the International Conference on Learning Representations, Vancouver, Canada (2018)

    Google Scholar 

  6. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Putta, S.R., Tulabandhula, T.: Pure exploration in episodic fixed-horizon markov decision processes. In: Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 1703–1704 (2017)

    Google Scholar 

  8. Chou, P.-W., Maturana, D., Scherer, S.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML, pp. 834–843 (2017)

    Google Scholar 

  9. Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31th International Conference on Machine Learning, pp. 387–395 (2014)

    Google Scholar 

  10. Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, Boca Raton (2010)

    Book  Google Scholar 

  11. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  12. Duan, Y., Chen, X., Houthooft, R., et al.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1329–1338 (2016)

    Google Scholar 

  13. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3th International Conference for Learning Representations, San Diego, USA (2015)

    Google Scholar 

Download references

Acknowledgement

This work was funded by National Natural Science Foundation (61272005, 61303108, 61373094, 61502323, 61272005, 61303108, 61373094, 61472262). We would also like to thank the reviewers for their helpful comments. Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, Y., Liu, Q. (2018). Policy Space Noise in Deep Deterministic Policy Gradient. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04179-3_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04178-6

  • Online ISBN: 978-3-030-04179-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics