Abstract
Deep deterministic policy gradient (DDPG) algorithm is an attractive reinforcement learning method, which directly optimizes the policy and has good performance in many continuous control tasks. In DDPG, the agent explores the environment by using the Gaussian white noise in the action space. In this paper, to further improve the efficiency of exploration, we inject the factorized Gaussian noise straightly to the policy space and propose a novel dithering perturbation way that can affect subsequent states in the future, resulting in more abundant trajectories. The noise is sampled from a Gaussian noise distribution and learned together with the network weight. In order to guarantee the effectiveness of the perturbation, the same perturbation ratio is used on all layers. Our method does not require augment the environment’s reward signal with additional intrinsic motivation term and the agent can learn directly from the environment. The proposed DDPG with exploratory noise in policy space was tested on the continuous control tasks. The experimental results demonstrate that it achieved better performance than the methods with no noise or action space noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al.: Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico (2016)
Singh, S.P., Barto, A.G., et al.: Variational information maximisation for intrinsically motivated reinforcement learning. In: NIPS, pp. 2125–2133 (2015)
Osband, I., Blundell, C., Pritzel, A., et al.: Deep exploration via bootstrapped DQN. In: NIPS, pp. 4026–4034 (2016)
Fix, J., Geist, M.: Monte-Carlo swarm policy search. In: ICAISC, pp. 75–83 (2012)
Plappert, M., Houthooft, R., Dhariwal, P., et al.: Parameter space noise for exploration. In: Proceedings of the International Conference on Learning Representations, Vancouver, Canada (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Putta, S.R., Tulabandhula, T.: Pure exploration in episodic fixed-horizon markov decision processes. In: Proceedings of the International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 1703–1704 (2017)
Chou, P.-W., Maturana, D., Scherer, S.: Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In: ICML, pp. 834–843 (2017)
Silver, D., Lever, G., Heess, N., et al.: Deterministic policy gradient algorithms. In: Proceedings of the 31th International Conference on Machine Learning, pp. 387–395 (2014)
Busoniu, L., Babuska, R., De Schutter, B., et al.: Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC press, Boca Raton (2010)
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Duan, Y., Chen, X., Houthooft, R., et al.: Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1329–1338 (2016)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3th International Conference for Learning Representations, San Diego, USA (2015)
Acknowledgement
This work was funded by National Natural Science Foundation (61272005, 61303108, 61373094, 61502323, 61272005, 61303108, 61373094, 61472262). We would also like to thank the reviewers for their helpful comments. Natural Science Foundation of Jiangsu (BK2012616), High School Natural Foundation of Jiangsu (13KJB520020), Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172014K04), Suzhou Industrial application of basic research program part (SYG201422).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yan, Y., Liu, Q. (2018). Policy Space Noise in Deep Deterministic Policy Gradient. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11302. Springer, Cham. https://doi.org/10.1007/978-3-030-04179-3_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-04179-3_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04178-6
Online ISBN: 978-3-030-04179-3
eBook Packages: Computer ScienceComputer Science (R0)