Skip to main content

Optimizing Q-Learning with K-FAC Algorithm

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2019)

Abstract

In this work, we present intermediate results of the application of Kronecker-factored Approximate curvature (K-FAC) algorithm to Q-learning problem. Being more expensive to compute than plain stochastic gradient descent, K-FAC allows the agent to converge a bit faster in terms of epochs compared to Adam on simple reinforcement learning tasks and tend to be more stable and less strict to hyperparameters selection. Considering the latest results we show that DDQN with K-FAC learns more quickly than with other optimizers and improves constantly in contradiction to similar with Adam or RMSProp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/maybe-hello-world/qfac.

References

  1. Mnih, V., et al.: Playing atari with deep reinforcement learning, December 2013. http://arxiv.org/abs/1312.5602

  2. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning, September 2015. http://arxiv.org/abs/1509.06461

  3. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning, July 2017. http://arxiv.org/abs/1707.06887

  4. Fortunato, M., et al.: Noisy networks for exploration, June 2017. http://arxiv.org/abs/1706.10295

  5. Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998). https://doi.org/10.1162/089976698300017746. http://www.mitpressjournls.org/10.1162/089976698300017746

    Article  Google Scholar 

  6. Knight, E., Lerner, O.: Natural gradient deep Q-learning, March 2018. http://arxiv.org/abs/1803.07482

  7. Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks, July 2015. http://arxiv.org/abs/1507.00210

  8. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization, February 2015. https://arxiv.org/abs/1502.05477

  9. Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, August 2017. http://arxiv.org/abs/1708.05144

  10. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Martens, J., Grosse, R.: Optimizing neural networks with Kronecker-factored approximate curvature, March 2015. http://arxiv.org/abs/1503.05671

  12. Martens, J., Tankasali, V., Duckworth, D., Johnson, M., Zhang, G., Koonce, B.: tensorflow\(\backslash \)kfac. https://github.com/tensorflow/kfac. Accessed 25 Mar 2019

  13. Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines

  14. Brockman, G., et al.: OpenAI Gym, June 2016. http://arxiv.org/abs/1606.01540

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, December 2014. http://arxiv.org/abs/1412.6980

  16. Hinton, G.: RMSProp. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. Accessed 25 Mar 2019

  17. Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-Explore: a new approach for hard-exploration problems, January 2019. http://arxiv.org/abs/1901.10995

  18. Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., Bruna, J.: Backplay: “Man muss immer umkehren”, July 2018. http://arxiv.org/abs/1807.06919

  19. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning, October 2017. http://arxiv.org/abs/1710.02298

Download references

Acknowledgments

We would like to thank Olga Tushkanova for helpful comments, constructive criticism and useful feedback. The results of the work were obtained using the computational resources of Peter the Great Saint-Petersburg Polytechnic University Supercomputing Center (www.scc.spbstu.ru). The research was partially funded by 5-100-2020 program and SPbPU university.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Beltiukov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Beltiukov, R. (2020). Optimizing Q-Learning with K-FAC Algorithm. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39575-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39574-2

  • Online ISBN: 978-3-030-39575-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics