Skip to main content

Advertisement

Log in

Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Restricted Boltzmann machines (RBMs) have been successfully applied in unsupervised learning and image density-based modeling. The aim of the pre-training step for RBMs is to discover an unknown stationary distribution based on the sample data that has the lowest energy. However, conventional RBM pre-training is sensitive to the initial weights and bias. The selection of initial values in RBM pre-training will directly affect the capabilities and efficiency of the learning process. This paper uses principal component analysis to capture the principal component directions of the training data. A set of initial parameter values for the RBM can be obtained by computing the same reconstruction of the data. Experiments on the Yale and MNIST datasets show that the proposed method not only retains a strong learning ability, but also significantly accelerates the learning speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Baldi P, Hornik K (1988) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58

    Article  Google Scholar 

  • Bengio Y (2013) Deep learning of representations: looking forward. In: SLSP, pp 1–37

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(8):1798–1828

    Article  Google Scholar 

  • Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: ALT, pp 18–36

  • Chen D, Socher R, Manning CD, Ng AY (2013) Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In: ICLR

  • Cho K, Raiko T, Ilin A (2011) Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: ICML, pp 105–112

  • Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP, pp 8599–8603

  • Erhan D, Bengio Y et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    MathSciNet  MATH  Google Scholar 

  • Fischer A, Igel C (2011) Bounding the bias of contrastive divergence learning. Neural Comput 23:664–673

    Article  MathSciNet  MATH  Google Scholar 

  • Fischer A, Igel C (2014) Training restricted Boltzmann machines: an introduction. Pattern recognit 47(1):25–39

    Article  MATH  Google Scholar 

  • Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800

  • Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Montavon G, Orr GB, Müller K-R (eds) Neural networks: tricks of the trade (2nd edn). Springer, Berlin, Heidelberg, pp 599–619

  • Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97

    Article  Google Scholar 

  • Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  • Huang P, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp 2333–2338

  • Kamyshanska H, Memisevic R (2013) On autoencoder scoring. In: ICML, pp 720–728

  • Kohli P, Osokin A, Jegelka S (2013) A principled deep random field model for image segmentation. In: CVPR, pp 1971–1978

  • Liu J, Gong M, Zhao J et al (2014) Difference representation learning using stacked restricted Boltzmann machines for change detection in SAR images. Soft Comput 1–13. doi:10.1007/s00500-014-1460-0

  • Lv JC, Yi Z (2006) Global convergence of a PCA learning algorithm with a constant learning rate. Comput Math Appl 52(10–11):1425–1438

    Article  MathSciNet  MATH  Google Scholar 

  • Lv JC, Yi Z, Tan KK (2007) Determination of the number of principal directions in a biologically plausible PCA model. IEEE Trans Neural Netw 18(3):910–916

    Article  Google Scholar 

  • Lv JC, Yi Z, Zhou J (2010a) Subspace learning of neural networks. CRC Press

  • Lv JC, Tan KK, Yi Z, Huang S (2010b) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226

    Article  Google Scholar 

  • Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: CVPR, pp 2480–2487

  • Mittelman R, Kuipers B, Savarese S, Lee H (2014) Structured recurrent temporal restricted Boltzmann machine. In: ICML, pp 1647–1655

  • Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22

  • Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR, pp 2551–2558

  • Salakhutdinov R, Mnih A, Hinton GE (2007) Restricted Boltzmann machines for collaborative filtering. In: ICML, pp 791–798

  • Salakhutdinov R, Murray I (2008) On the quantitative analysis of deep belief networks. In: ICML, pp 872–879

  • Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng Andrew, Potts Chris (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP

  • Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. CVPR

  • Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: ICML, pp 1139–1147

  • Tang Y, Salakhutdinov R, Hinton G (2012) Robust Boltzmann machines for recognition and denoising. In: CVPR

Download references

Acknowledgments

This work was supported by National Science Foundation of China under Grants 61375065, 61432014 and 61432012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiancheng Lv.

Ethics declarations

Conflict of interest

None.

Ethical standards

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, C., Lv, J. & Li, X. Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training. Soft Comput 21, 6471–6479 (2017). https://doi.org/10.1007/s00500-016-2205-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2205-z

Keywords

Navigation