Skip to main content

Stable Training of Bellman Error in Reinforcement Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1333))

Included in the following conference series:

Abstract

The optimization of Bellman error is the key to value function learning in principle. However, it always suffers from unstable training and slow convergence. In this paper, we investigate the problem of optimizing Bellman error distribution, aiming at stabilizing the process of Bellman error training. Then, we propose a framework that the Bellman error distribution at the current time approximates the previous one, under the hypothesis that the Bellman error follows a stationary random process if the training process is convergent, which can stabilize the value function learning. Next, we minimize the distance of two distributions with the Stein Variational Gradient Descend (SVGD) method, which benefits the balance of exploration and exploitation in parameter space. Then, we incorporate this framework in the advantage actor-critic (A2C) algorithms. Experimental results on discrete control problems, show our algorithm getting better average returns and smaller Bellman errors than both A2C algorithms and anchor method. Besides, it would stabilize the training process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/2019ChenGong/SVA2C.

  2. 2.

    https://github.com/openai/baselines/tree/master/baselines/a2c.

  3. 3.

    https://github.com/TeaPearce.

References

  1. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)

    Article  Google Scholar 

  2. Espeholt, L., Soyer, H., Munos, et al.: Impala: scalable distributed deep-rl with importance weighted actor-learner architectures. arXiv preprint arXiv:1802.01561 (2018)

  3. Hessel, M., Modayil, J., Hasselt, V., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  4. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in neural information processing systems, pp. 1008–1014 (2000)

    Google Scholar 

  5. Liu, Q., Wang, D.: Stein variational gradient descent: a general purpose Bayesian inference algorithm. Adv. Neural Inf. Process. Syst. 29, 2378–2386 (2016)

    Google Scholar 

  6. Liu, Q., Wang, D.: Stein variational gradient descent as moment matching. Adv. Neural Inf. Process. Syst. 31, 8854–8863 (2018)

    Google Scholar 

  7. Liu, Y., et al.: Stein Variational Policy Gradient. arXiv:1704.02399 (2017)

  8. Mnih, V., Badia, A.P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937 (2016)

    Google Scholar 

  9. Mnih, V., Kavukcuoglu, K., Silver, et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236

  10. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  11. Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)

    Google Scholar 

  12. Pearce, T., Anastassacos, N., Zaki, M., Neely, A.: Bayesian inference with anchored ensembles of neural networks, and application to reinforcement learning. arXiv preprint arXiv:1805.11324 (2018)

  13. Pearce, T., Zaki, M., Brintrup, A., Anastassacos, N., Neely, A.: Uncertainty in neural networks: Bayesian ensembling. arXiv preprint arXiv:1810.05546 (2018)

  14. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  15. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897 (2015)

    Google Scholar 

  16. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  17. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms (2014)

    Google Scholar 

  18. Tang, J., Qu, M., Wang, M., et al.: Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp. 1067–1077 (2015)

    Google Scholar 

  19. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinwen Hou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gong, C., Bai, Y., Hou, X., Ji, X. (2020). Stable Training of Bellman Error in Reinforcement Learning. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1333. Springer, Cham. https://doi.org/10.1007/978-3-030-63823-8_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63823-8_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63822-1

  • Online ISBN: 978-3-030-63823-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics