skip to main content
10.1145/3193077.3193081acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccdaConference Proceedingsconference-collections
research-article

An Infinite Deep Boltzmann Machine

Authors Info & Claims
Published:23 March 2018Publication History

ABSTRACT

The deep Boltzmann machine (DBM) is a powerful "deep" probabilistic model which learns a hierarchical representation of the data. However, choosing the size of each hidden layer of a DBM is difficult as the proper size of the model varies according to different tasks. Choosing a proper model size is a essential model selection problem for latent variable graphical models. This paper provides a new variant of DBM, called the infinite deep Boltzmann machine (iDBM), which can freely change the number of hidden units participating in the energy function of each layer. A greedy training method is proposed to pre-train our model, after which the size of each layer is fixed, and the model is transferred into an ordinary DBM. Experimental results on MNIST and CalTech101 Silhouettes indicate that iDBM can learn a generative and discriminative model as good as the original DBM, and has successfully eliminated the requirement of model selection for hidden layer sizes of DBMs.

References

  1. Salakhutdinov, R., and G. Hinton. 2009. Deep Boltzmann machines. Journal of Machine Learning Research, 5,2 (2009), 448--455.Google ScholarGoogle Scholar
  2. Salakhutdinov, R. R., and G. E. Hinton. 2012. A better way to pretrain deep Boltzmann machines. Advances in Neural Information Processing Systems 3 (2012), 2447--2455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cho, Kyung Hyun, T. Raiko, and A. Ilin. 2013. Gaussian-Bernoulli deep Boltzmann machine. The 2013 International Joint Conference on Neural Networks, 1--7.Google ScholarGoogle Scholar
  4. Srivastava, Nitish, and R. Salakhutdinov. 2014. Multimodal learning with deep Boltzmann machines. Journal of Machine Learning Research, 15, 8 (2014), 1967 - 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Srivastava, Nitish, R. R. Salakhutdinov, and G. E. Hinton. 2013. Modeling documents with deep Boltzmann machines,. eprint arXiv:1309.6865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ND Chi, K Luu, KG Quach, and TD Bui. 2016. Longitudinal face modeling via temporal deep restricted Boltzmann machines. eprint arXiv:1606.02254, 2016.Google ScholarGoogle Scholar
  7. Srivastava N, Hinton G, Krizhevsky A, et al. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. MA Côté, H Larochelle. 2016. An infinite restricted Boltzmann machine. Neural Computation, 28, 7 (2016), 1265--1288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation, 14 (2002), 1771--1800,. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ackley D H, Hinton G E, Sejnowski T J. 1985. A learning algorithm for Boltzmann machines. Cognitive Science, 9, 1 (1985), 147--169.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Salakhutdinov and G. E. Hinton. 2012. An efficient learning procedure for deep Boltzmann machines. Neural Computation, 24 (2012), 1967--2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ruslan Salakhutdinov. 2015. Learning deep generative models. Statistics and Its Application, 2, 2 (2015), 361--385.Google ScholarGoogle ScholarCross RefCross Ref
  13. Desjardins G, Courville A, Bengio Y. 2013. On training deep Boltzmann machines. eprint arXiv:1203.4416.Google ScholarGoogle Scholar
  14. Goodfellow I J, Courville A, Bengio Y. 2013. Joint training deep Boltzmann machines for classification. eprint arXiv:1301.3568.Google ScholarGoogle Scholar
  15. Salakhutdinov, R. and Murray, I. 2008. On the quantitative analysis of deep belief networks. Proceedings of the 25th Annual International Conference on Machine Learning, 872--879. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Marlin, B. M., Swersky, K., Chen, B., & Freitas, N. D. 2011. Inductive principles for restricted boltzmann machine learning. Journal of Machine Learning Research − Proceedings Track for Artificial Intelligence & Statistics, 9, 9 (2011), 509--516.Google ScholarGoogle Scholar
  17. Xuan Peng, Xunzhang Gao and Xiang Li. 2017. On better training the infinite restricted Boltzmann machines. eprint arXiv: 1709.03239.Google ScholarGoogle Scholar
  18. Duchi, John, E. Hazan, and Y. Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization." Journal of Machine Learning Research, 12 (2011). 2121--2159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Radford M Neal. 2001. Annealed importance sampling. Statistics and Computing, 11 (2001), 125--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hinton, G. E., and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science, 313 (2006), 504--507.Google ScholarGoogle ScholarCross RefCross Ref
  21. KyungHyun Cho, Tapani Raiko, and Alexander Ilin. 2013. Enhanced Gradient for Training Restricted Boltzmann Machines. Neural Computation, 25, 3 (2013), 805--831 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Infinite Deep Boltzmann Machine

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICCDA '18: Proceedings of the 2nd International Conference on Compute and Data Analysis
        March 2018
        94 pages
        ISBN:9781450363594
        DOI:10.1145/3193077

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 March 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader