Abstract
The natural gradient is a powerful method to improve the transient dynamics of learning by utilizing the geometric structure of the parameter space. Many natural gradient methods have been developed for maximum likelihood learning, which is based on Kullback-Leibler (KL) divergence and its Fisher metric. However, they require the computation of the normalization constant and are not applicable to statistical models with an analytically intractable normalization constant. In this study, we extend the natural gradient framework to divergences for the unnormalized statistical models: score matching and ratio matching. In addition, we derive novel adaptive natural gradient algorithms that do not require computationally demanding inversion of the metric and show their effectiveness in some numerical experiments. In particular, experimental results in a multi-layer neural network model demonstrate that the proposed method can escape from the plateau phenomena much faster than the conventional stochastic gradient descent method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amari, S.-I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Amari, S.-I.: Information Geometry and Its Applications. Springer, Japan (2016)
Amari, S.-I., Park, H., Fukumizu, K.: Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural Comput. 12(6), 1399–1409 (2000)
Park, H., Amari, S.-I., Fukumizu, K.: Adaptive natural gradient learning algorithms for various stochastic models. Neural Netw. 13(7), 755–764 (2000)
Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 695–709 (2005)
Hyvärinen, A.: Some extensions of score matching. Comput. Stat. Data Anal. 51(5), 2499–2512 (2007)
Köster, U., Hyvärinen, A.: A two-layer model of natural stimuli estimated with score matching. Neural Comput. 22(9), 2308–2333 (2010)
Swersky, K., Ranzato, M., Buchman, D., Marlin, B.M., Freitas, N.D.: On autoencoders and score matching for energy based models. In: International Conference on Machine Learning, pp. 1201–1208 (2011)
Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)
Eguchi, S.: Second order efficiency of minimum contrast estimators in a curved exponential family. Ann. Stat. 11(3), 793–803 (1983)
Marlin, B.M., Swersky, K., Chen, B., Freitas, N.D.: Inductive principles for restricted Boltzmann machine learning. In: International Conference on Artificial Intelligence and Statistics, pp. 509–516 (2010)
Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv preprint, arXiv:1301.3584 (2013)
Roux, N.L., Manzagol, P.-A., Bengio, Y.: Topmoumoute online natural gradient algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2008)
Desjardins, G., Pascanu, R., Courville, A., Bengio, Y.: Metric-free natural gradient for joint-training of boltzmann machines, arXiv preprint, arXiv:1301.3545 (2013)
Grosse, R., Salakhudinov, R.: Scaling up natural gradient by sparsely factorizing the inverse fisher matrix. In: International Conference on Machine Learning, pp. 2304–2313 (2015)
Acknowledgments
This work was supported by a Grant-in-Aid for JSPS Fellows (No. 14J08282) from the Japan Society for the Promotion of Science (JSPS).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Karakida, R., Okada, M., Amari, Si. (2016). Adaptive Natural Gradient Learning Algorithms for Unnormalized Statistical Models. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-44778-0_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44777-3
Online ISBN: 978-3-319-44778-0
eBook Packages: Computer ScienceComputer Science (R0)