Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning

Wang, Xinzhi; Liu, Yang; Chang, Yudong; Jiang, Chao; Zhang, Qingjie

doi:10.1007/978-3-031-10986-7_16

Xinzhi Wang¹²,
Yang Liu¹²,
Yudong Chang¹²,
Chao Jiang¹² &
…
Qingjie Zhang¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2191 Accesses

Abstract

Discovering efficient exploration strategies is a central challenge in reinforcement learning (RL). Deep reinforcement learning (DRL) methods proposed in recent years have mainly focused on improving the generalization of models while ignoring models’ explanation. In this study, an embedding explanation for the advantage actor-critic algorithm (EEA2C) is proposed to balance the relationship between exploration and exploitation for DRL models. Specifically, the proposed algorithm explains agent’s actions before employing explanation to guide exploration. A fusion strategy is then designed to retain information that is helpful for exploration from experience. Based on the results of the fusion strategy, a variational autoencoder (VAE) is designed to encode the task-related explanation into a probabilistic latent representation. The latent representation of the VAE is finally incorporated into the agent’s policy as prior knowledge. Experimental results for six Atari environments show that the proposed method improves the agent’s exploratory capabilities with explainable knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Curiosity-Driven Variational Autoencoder for Deep Q Network

More effective and efficient exploration via more refined gradient information

Article 22 August 2023

Interpretable multi-agent reinforcement learning via multi-head variational autoencoders

Article 26 March 2025

References

Baker, B., et al.: Emergent tool use from multi-agent autocurricula. In: 8th International Conference on Learning Representations. OpenReview.net (2020)
Google Scholar
Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479 (2016)
Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2021)
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., Efros, A.A.: Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355 (2018)
Chakraborty, S., et al.: Interpretability of deep learning models: a survey of results. In: Smartworld, pp. 1–6 (2017)
Google Scholar
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437 (2017)
Google Scholar
Fortunato, M., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
François-Lavet, V., Bengio, Y., Precup, D., Pineau, J.: Combined reinforcement learning via abstract representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3582–3589 (2019)
Google Scholar
Fukui, H., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: Attention branch network: learning of attention mechanism for visual explanation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10705–10714 (2019)
Google Scholar
Goyal, A., et al.: InfoBot: transfer and exploration via the information bottleneck. arXiv preprint arXiv:1901.10902 (2019)
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. In: International Conference on Machine Learning, pp. 1792–1801 (2018)
Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: International Conference on Machine Learning, pp. 2778–2787. PMLR (2017)
Google Scholar
Plappert, M., et al.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Tang, H., et al.: Exploration: a study of count-based exploration for deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 30, pp. 2753–2762 (2017)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Wang, X., Sugumaran, V., Zhang, H., Xu, Z.: A capability assessment model for emergency management organizations. Inf. Syst. Front. 20(4), 653–667 (2018). https://doi.org/10.1007/s10796-017-9786-7
Article Google Scholar
Wang, X., Yuan, S., Zhang, H., Lewis, M., Sycara, K.P.: Verbal explanations for deep reinforcement learning neural networks with attention on extracted features. In: 28th IEEE International Conference on Robot and Human Interactive Communication, pp. 1–7. IEEE (2019)
Google Scholar
Zintgraf, L.M., et al.: VariBAD: variational Bayes-adaptive deep RL via meta-learning. J. Mach. Learn. Res. 22, 289:1–289:39 (2021)
Google Scholar

Download references

Acknowledgements

This work is sponsored by Shanghai Sailing Program (NO. 20YF1413800).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, China
Xinzhi Wang, Yang Liu, Yudong Chang, Chao Jiang & Qingjie Zhang

Authors

Xinzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qingjie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinzhi Wang .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Liu, Y., Chang, Y., Jiang, C., Zhang, Q. (2022). Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_16
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incorporating Explanations to Balance the Exploration and Exploitation of Deep Reinforcement Learning