Optimizing Q-Learning with K-FAC Algorithm

Beltiukov, Roman

doi:10.1007/978-3-030-39575-9_1

Roman Beltiukov ORCID: orcid.org/0000-0001-8270-0219²⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1086))

Included in the following conference series:

International Conference on Analysis of Images, Social Networks and Texts

800 Accesses
2 Citations

Abstract

In this work, we present intermediate results of the application of Kronecker-factored Approximate curvature (K-FAC) algorithm to Q-learning problem. Being more expensive to compute than plain stochastic gradient descent, K-FAC allows the agent to converge a bit faster in terms of epochs compared to Adam on simple reinforcement learning tasks and tend to be more stable and less strict to hyperparameters selection. Considering the latest results we show that DDQN with K-FAC learns more quickly than with other optimizers and improves constantly in contradiction to similar with Adam or RMSProp.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN

Variance Reduction for Deep Q-Learning Using Stochastic Recursive Gradient

A Survey of Deep Q-Networks used for Reinforcement Learning: State of the Art

Notes

1.
https://github.com/maybe-hello-world/qfac.

References

Mnih, V., et al.: Playing atari with deep reinforcement learning, December 2013. http://arxiv.org/abs/1312.5602
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning, September 2015. http://arxiv.org/abs/1509.06461
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning, July 2017. http://arxiv.org/abs/1707.06887
Fortunato, M., et al.: Noisy networks for exploration, June 2017. http://arxiv.org/abs/1706.10295
Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998). https://doi.org/10.1162/089976698300017746. http://www.mitpressjournls.org/10.1162/089976698300017746
Article Google Scholar
Knight, E., Lerner, O.: Natural gradient deep Q-learning, March 2018. http://arxiv.org/abs/1803.07482
Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks, July 2015. http://arxiv.org/abs/1507.00210
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization, February 2015. https://arxiv.org/abs/1502.05477
Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, August 2017. http://arxiv.org/abs/1708.05144
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
MATH Google Scholar
Martens, J., Grosse, R.: Optimizing neural networks with Kronecker-factored approximate curvature, March 2015. http://arxiv.org/abs/1503.05671
Martens, J., Tankasali, V., Duckworth, D., Johnson, M., Zhang, G., Koonce, B.: tensorflow$\backslash $kfac. https://github.com/tensorflow/kfac. Accessed 25 Mar 2019
Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines
Brockman, G., et al.: OpenAI Gym, June 2016. http://arxiv.org/abs/1606.01540
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, December 2014. http://arxiv.org/abs/1412.6980
Hinton, G.: RMSProp. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. Accessed 25 Mar 2019
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-Explore: a new approach for hard-exploration problems, January 2019. http://arxiv.org/abs/1901.10995
Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., Bruna, J.: Backplay: “Man muss immer umkehren”, July 2018. http://arxiv.org/abs/1807.06919
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning, October 2017. http://arxiv.org/abs/1710.02298

Download references

Acknowledgments

We would like to thank Olga Tushkanova for helpful comments, constructive criticism and useful feedback. The results of the work were obtained using the computational resources of Peter the Great Saint-Petersburg Polytechnic University Supercomputing Center (www.scc.spbstu.ru). The research was partially funded by 5-100-2020 program and SPbPU university.

Author information

Authors and Affiliations

Peter the Great St. Petersburg Polytechnic University, Saint-Petersburg, Russia
Roman Beltiukov

Authors

Roman Beltiukov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Beltiukov .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Wil M. P. van der Aalst
University of Ljubljana, Ljubljana, Slovenia
Vladimir Batagelj
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov
Institute of Mathematics and Mechanics Yekaterinburg, Yekaterinburg, Russia
Michael Khachay
National Research University Higher School of Economics, Moscow, Russia
Valentina Kuskova
University of Oslo, Oslo, Norway
Andrey Kutuzov
National Research University Higher School of Economics, Moscow, Russia
Sergei O. Kuznetsov
National Research University Higher School of Economics, Moscow, Russia
Irina A. Lomazova
Moscow State University, Moscow, Russia
Natalia Loukachevitch
Loria, Vandoeuvre lès Nancy, France
Amedeo Napoli
University of Florida, Gainesville, USA
Panos M. Pardalos
Ca' Foscari University of Venice, Venezia Mestre, Italy
Marcello Pelillo
National Research University Higher School of Economics, Nizhny Novgorod, Russia
Andrey V. Savchenko
Kazan Federal University, Kazan, Russia
Elena Tutubalina

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beltiukov, R. (2020). Optimizing Q-Learning with K-FAC Algorithm. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-39575-9_1
Published: 02 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39574-2
Online ISBN: 978-3-030-39575-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics