Abstract
Restricted Boltzmann machines (RBMs) have been successfully applied in unsupervised learning and image density-based modeling. The aim of the pre-training step for RBMs is to discover an unknown stationary distribution based on the sample data that has the lowest energy. However, conventional RBM pre-training is sensitive to the initial weights and bias. The selection of initial values in RBM pre-training will directly affect the capabilities and efficiency of the learning process. This paper uses principal component analysis to capture the principal component directions of the training data. A set of initial parameter values for the RBM can be obtained by computing the same reconstruction of the data. Experiments on the Yale and MNIST datasets show that the proposed method not only retains a strong learning ability, but also significantly accelerates the learning speed.
Similar content being viewed by others
References
Baldi P, Hornik K (1988) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58
Bengio Y (2013) Deep learning of representations: looking forward. In: SLSP, pp 1–37
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell (PAMI) 35(8):1798–1828
Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: ALT, pp 18–36
Chen D, Socher R, Manning CD, Ng AY (2013) Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. In: ICLR
Cho K, Raiko T, Ilin A (2011) Enhanced gradient and adaptive learning rate for training restricted Boltzmann machines. In: ICML, pp 105–112
Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP, pp 8599–8603
Erhan D, Bengio Y et al (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660
Fischer A, Igel C (2011) Bounding the bias of contrastive divergence learning. Neural Comput 23:664–673
Fischer A, Igel C (2014) Training restricted Boltzmann machines: an introduction. Pattern recognit 47(1):25–39
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800
Hinton GE (2012) A practical guide to training restricted Boltzmann machines. In: Montavon G, Orr GB, Müller K-R (eds) Neural networks: tricks of the trade (2nd edn). Springer, Berlin, Heidelberg, pp 599–619
Hinton G, Deng L, Yu D, Dahl G, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29(6):82–97
Hinton GE, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Huang P, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp 2333–2338
Kamyshanska H, Memisevic R (2013) On autoencoder scoring. In: ICML, pp 720–728
Kohli P, Osokin A, Jegelka S (2013) A principled deep random field model for image segmentation. In: CVPR, pp 1971–1978
Liu J, Gong M, Zhao J et al (2014) Difference representation learning using stacked restricted Boltzmann machines for change detection in SAR images. Soft Comput 1–13. doi:10.1007/s00500-014-1460-0
Lv JC, Yi Z (2006) Global convergence of a PCA learning algorithm with a constant learning rate. Comput Math Appl 52(10–11):1425–1438
Lv JC, Yi Z, Tan KK (2007) Determination of the number of principal directions in a biologically plausible PCA model. IEEE Trans Neural Netw 18(3):910–916
Lv JC, Yi Z, Zhou J (2010a) Subspace learning of neural networks. CRC Press
Lv JC, Tan KK, Yi Z, Huang S (2010b) A family of fuzzy learning algorithms for robust principal component analysis neural networks. IEEE Trans Fuzzy Syst 18(1):217–226
Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: CVPR, pp 2480–2487
Mittelman R, Kuipers B, Savarese S, Lee H (2014) Structured recurrent temporal restricted Boltzmann machine. In: ICML, pp 1647–1655
Mohamed A, Dahl G, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22
Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR, pp 2551–2558
Salakhutdinov R, Mnih A, Hinton GE (2007) Restricted Boltzmann machines for collaborative filtering. In: ICML, pp 791–798
Salakhutdinov R, Murray I (2008) On the quantitative analysis of deep belief networks. In: ICML, pp 872–879
Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng Andrew, Potts Chris (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: EMNLP
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. CVPR
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: ICML, pp 1139–1147
Tang Y, Salakhutdinov R, Hinton G (2012) Robust Boltzmann machines for recognition and denoising. In: CVPR
Acknowledgments
This work was supported by National Science Foundation of China under Grants 61375065, 61432014 and 61432012.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Ethical standards
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Xie, C., Lv, J. & Li, X. Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training. Soft Comput 21, 6471–6479 (2017). https://doi.org/10.1007/s00500-016-2205-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2205-z