GAN-Based Planning Model in Deep Reinforcement Learning

Chen, Song; Jiang, Junpeng; Zhang, Xiaofang; Wu, Jinjin; Lu, Gongzheng

doi:10.1007/978-3-030-61616-8_26

Song Chen¹¹,
Junpeng Jiang¹¹,
Xiaofang Zhang^11,12,
Jinjin Wu¹¹ &
…
Gongzheng Lu¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12397))

Included in the following conference series:

International Conference on Artificial Neural Networks

2399 Accesses

Abstract

Deep reinforcement learning methods have achieved unprecedented success in many high-dimensional and large-scale space sequential decision-making tasks. In these methods, model-based methods rely on planning as their primary component, while model-free methods primarily rely on learning. However, the accuracy of the environmental model has a significant impact on the learned policy. When the model is incorrect, the planning process is likely to compute a suboptimal policy. In order to get a more accurate environmental model, this paper introduces the GAN-based Planning Model (GBPM) exploiting the strong expressive ability of Generative Adversarial Net (GAN), which can learn to simulate the environment from experience and construct implicit planning. The GBPM can be trained using real transfer samples experienced by the agent. Then, the agent can utilize the GBPM to produce simulated experience or trajectories so as to improve the learned policy. The GBPM can act as a role for experience replay so that it can be applied to both model-based and model-free methods, such as Dyna, DQN, ACER, and so on. Experimental results indicate that the GBPM can improve the data efficiency and algorithm performance on Maze and Atari 2600 game domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adam, S., Busoniu, L., Babuska, R.: Experience replay for real-time reinforcement learning control. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 201–212 (2011). https://doi.org/10.1109/TSMCC.2011.2106494
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)
Google Scholar
Degris, T., Pilarski, P.M., Sutton, R.S.: Model-free reinforcement learning with continuous action in practice. In: 2012 American Control Conference (ACC), pp. 2177–2182. IEEE (2012). https://doi.org/10.1109/ACC.2012.6315022
Doya, K., Samejima, K., Katagiri, K.I., Kawato, M.: Multiple model-based reinforcement learning. Neural Comput. 14(6), 1347–1369 (2002). https://doi.org/10.1162/089976602753712972
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1146–1155. JMLR.org (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012). https://doi.org/10.1109/MSP.2012.2205597
Article Google Scholar
Karkus, P., Hsu, D., Lee, W.S.: QMDP-Net: deep learning for planning under partial observability. In: Advances in Neural Information Processing Systems, pp. 4694–4704 (2017)
Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.: Safe and efficient off-policy reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1054–1062 (2016)
Google Scholar
Pan, Y., Zaheer, M., White, A., Patterson, A., White, M.: Organizing experience: a deeper look at replay mechanisms for sample-based planning in continuous state domains. arXiv preprint arXiv:1806.04624 (2018). https://doi.org/10.24963/ijcai.2018/666
Pascanu, R., et al.: Learning model-based planning from scratch. arXiv preprint arXiv:1707.06170 (2017)
Racanière, S., et al.: Imagination-augmented agents for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 5690–5701 (2017)
Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Shadlen, M.N., Shohamy, D.: Decision making and sequential sampling from memory. Neuron 90(5), 927–939 (2016). https://doi.org/10.1016/j.neuron.2016.04.036
Article Google Scholar
Silver, D., et al.: The predictron: end-to-end learning and planning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3191–3199. JMLR.org (2017)
Google Scholar
Strehl, A.L., Li, L., Wiewiora, E., Langford, J., Littman, M.L.: PAC model-free reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 881–888 (2006). https://doi.org/10.1145/1143844.1143955
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bull. 2(4), 160–163 (1991). https://doi.org/10.1145/122344.122377
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
MATH Google Scholar
Tamar, A., Wu, Y., Thomas, G., Levine, S., Abbeel, P.: Value iteration networks. In: Advances in Neural Information Processing Systems, pp. 2154–2162 (2016)
Google Scholar
Wang, Z., et al.: Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016)

Download references

Acknowledgment

This work was supported in part by National Natural Science Foundation of China (61772263), Suzhou Technology Development Plan (SYG201807), and the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, China
Song Chen, Junpeng Jiang, Xiaofang Zhang & Jinjin Wu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Xiaofang Zhang & Gongzheng Lu

Authors

Song Chen
View author publications
You can also search for this author in PubMed Google Scholar
Junpeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinjin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Gongzheng Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofang Zhang .

Editor information

Editors and Affiliations

Department of Applied Informatics, Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs. Lyngby, Denmark
Paolo Masulli
Department of Informatics, University of Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, S., Jiang, J., Zhang, X., Wu, J., Lu, G. (2020). GAN-Based Planning Model in Deep Reinforcement Learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-61616-8_26
Published: 14 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics