Abstract
In this work, we present intermediate results of the application of Kronecker-factored Approximate curvature (K-FAC) algorithm to Q-learning problem. Being more expensive to compute than plain stochastic gradient descent, K-FAC allows the agent to converge a bit faster in terms of epochs compared to Adam on simple reinforcement learning tasks and tend to be more stable and less strict to hyperparameters selection. Considering the latest results we show that DDQN with K-FAC learns more quickly than with other optimizers and improves constantly in contradiction to similar with Adam or RMSProp.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., et al.: Playing atari with deep reinforcement learning, December 2013. http://arxiv.org/abs/1312.5602
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning, September 2015. http://arxiv.org/abs/1509.06461
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning, July 2017. http://arxiv.org/abs/1707.06887
Fortunato, M., et al.: Noisy networks for exploration, June 2017. http://arxiv.org/abs/1706.10295
Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998). https://doi.org/10.1162/089976698300017746. http://www.mitpressjournls.org/10.1162/089976698300017746
Knight, E., Lerner, O.: Natural gradient deep Q-learning, March 2018. http://arxiv.org/abs/1803.07482
Desjardins, G., Simonyan, K., Pascanu, R., Kavukcuoglu, K.: Natural neural networks, July 2015. http://arxiv.org/abs/1507.00210
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization, February 2015. https://arxiv.org/abs/1502.05477
Wu, Y., Mansimov, E., Liao, S., Grosse, R., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, August 2017. http://arxiv.org/abs/1708.05144
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT Press, Cambridge (1998)
Martens, J., Grosse, R.: Optimizing neural networks with Kronecker-factored approximate curvature, March 2015. http://arxiv.org/abs/1503.05671
Martens, J., Tankasali, V., Duckworth, D., Johnson, M., Zhang, G., Koonce, B.: tensorflow\(\backslash \)kfac. https://github.com/tensorflow/kfac. Accessed 25 Mar 2019
Dhariwal, P., et al.: OpenAI baselines (2017). https://github.com/openai/baselines
Brockman, G., et al.: OpenAI Gym, June 2016. http://arxiv.org/abs/1606.01540
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization, December 2014. http://arxiv.org/abs/1412.6980
Hinton, G.: RMSProp. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf. Accessed 25 Mar 2019
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: Go-Explore: a new approach for hard-exploration problems, January 2019. http://arxiv.org/abs/1901.10995
Resnick, C., Raileanu, R., Kapoor, S., Peysakhovich, A., Cho, K., Bruna, J.: Backplay: “Man muss immer umkehren”, July 2018. http://arxiv.org/abs/1807.06919
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning, October 2017. http://arxiv.org/abs/1710.02298
Acknowledgments
We would like to thank Olga Tushkanova for helpful comments, constructive criticism and useful feedback. The results of the work were obtained using the computational resources of Peter the Great Saint-Petersburg Polytechnic University Supercomputing Center (www.scc.spbstu.ru). The research was partially funded by 5-100-2020 program and SPbPU university.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Beltiukov, R. (2020). Optimizing Q-Learning with K-FAC Algorithm. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2019. Communications in Computer and Information Science, vol 1086. Springer, Cham. https://doi.org/10.1007/978-3-030-39575-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-39575-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39574-2
Online ISBN: 978-3-030-39575-9
eBook Packages: Computer ScienceComputer Science (R0)