Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

Fischer, Asja; Igel, Christian

doi:10.1007/978-3-642-15825-4_26

Asja Fischer¹⁹ &
Christian Igel¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6354))

Included in the following conference series:

International Conference on Artificial Neural Networks

3475 Accesses
25 Citations

Abstract

Learning algorithms relying on Gibbs sampling based stochastic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence (CD) and its refined variants Persistent CD (PCD) and Fast PCD (FPCD). As the approximations are biased, the maximum of the log-likelihood is not necessarily obtained. Recently, it has been shown that CD, PCD, and FPCD can even lead to a steady decrease of the log-likelihood during learning. Taking artificial data sets from the literature we study these divergence effects in more detail. Our results indicate that the log-likelihood seems to diverge especially if the target distribution is difficult to learn for the RBM. The decrease of the likelihood can not be detected by an increase of the reconstruction error, which has been proposed as a stopping criterion for CD learning. Weight-decay with a carefully chosen weight-decay-parameter can prevent divergence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)
Article MATH Google Scholar
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) International Conference on Machine learning (ICML), pp. 1064–1071. ACM, New York (2008)
Chapter Google Scholar
Tieleman, T., Hinton, G.E.: Using fast weights to improve persistent contrastive divergence. In: Pohoreckyj Danyluk, A., Bottou, L., Littman, M.L. (eds.) International Conference on Machine Learning (ICML), pp. 1033–1040. ACM, New York (2009)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)
Article MATH MathSciNet Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Carreira-Perpiñán, M.Á., Hinton, G.E.: On contrastive divergence learning. In: 10th International Workshop on Artificial Intelligence and Statistics (AISTATS 2005), pp. 59–66 (2005)
Google Scholar
Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Computation 21(6), 1601–1621 (2009)
Article MATH MathSciNet Google Scholar
Fischer, A., Igel, C.: Contrastive divergence learning may diverge when training restricted Boltzmann machines. In: Frontiers in Computational Neuroscience. Bernstein Conference on Computational Neuroscience, BCCN 2009 (2009)
Google Scholar
Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Dellaleau, O.: Parallel tempering for training of restricted Boltzmann machines. iN: Journal of Machine Learning Research Workshop and Conference Proceedings (AISTATS 2010), vol. 9, pp. 145–152 (2010)
Google Scholar
Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Foundations, vol. 1, pp. 194–281. MIT Press, Cambridge (1986)
Google Scholar
Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985)
Article Google Scholar
Hinton, G.E., Sejnowski, T.J.: Learning and relearning in Boltzmann machines. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Foundations, vol. 1, pp. 282–317. MIT Press, Cambridge (1986)
Google Scholar
MacKay, D.J.C.: Failures of the one-step learning algorithm. Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE, UK (2001), http://www.cs.toronto.edu/~mackay/gbm.pdf
Yuille, A.: The convergence of contrastive divergence. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Processing Systems (NIPS 17), pp. 1593–1600 (2004)
Google Scholar
Younes, L.: Maximum likelihood estimation of gibbs fields. In: Possolo, A. (ed.) Proceedings of an AMS-IMS-SIAM Joint Conference on Spacial Statistics and Imaging. Lecture Notes Monograph Series, Institute of Mathematical Statistics, Hayward (1991)
Google Scholar
Salakhutdinov, R.: Learning in markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1598–1606 (2009)
Google Scholar
MacKay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press, Cambridge (2002)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing (NIPS 19), pp. 153–160. MIT Press, Cambridge (2007)
Google Scholar
Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems (NIPS 19), pp. 1345–1352. MIT Press, Cambridge (2007)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6), 1631–1649 (2008)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Neuroinformatik, Ruhr-Universität Bochum, 44780, Bochum, Germany
Asja Fischer & Christian Igel

Authors

Asja Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Igel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, TEI of Thessaloniki, 57400, Sindos, Greece
Konstantinos Diamantaras
Department of Informatics, Nicolaus Copernicus University, School of Physics, Astronomy, and Informatics, ul. Grudziadzka 5, 87-100, Torun, Poland
Wlodek Duch
Department of Forestry and Management of the Environment and Natural Resources, Democritus University of Thrace, Pantazidou 193, 68200, Orestiada Thrace, Greece
Lazaros S. Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fischer, A., Igel, C. (2010). Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15825-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-15825-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15824-7
Online ISBN: 978-3-642-15825-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics