Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Sanjaya, Prima; Kang, Dae-Ki

doi:10.1007/s10489-018-01400-5

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Published: 02 February 2019

Volume 49, pages 2723–2734, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

462 Accesses
1 Citation
Explore all metrics

Abstract

Restricted Boltzmann machines (RBMs) can be trained by applying stochastic gradient ascent to the objective function as the maximum likelihood learning. However, it is a difficult task due to the intractability of marginalization function gradient. Several methodologies have been proposed by adopting Gibbs Markov chain to approximate this intractability including Contrastive Divergence, Persistent Contrastive Divergence, and Fast Contrastive Divergence. In this paper, we propose an optimization which is injecting noise to underlying Monte Carlo estimation. We introduce two novel learning algorithms. They are Noisy Persistent Contrastive Divergence (NPCD), and further Fast Noisy Persistent Contrastive Divergence (FNPCD). We prove that the NPCD and FNPCD algorithms benefit on the average to equilibrium state with satisfactory condition. We have performed empirical investigation of diverse CD-based approaches and found that our proposed methods frequently obtain higher classification performance than traditional approaches on several benchmark tasks in standard image classification tasks such as MNIST, basic, and rotation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian-discrete restricted Boltzmann machine with sparse-regularized hidden layer

Article Open access 16 April 2024

Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training

Article 04 June 2016

Mode-assisted unsupervised learning of restricted Boltzmann machines

Article Open access 05 June 2020

Notes

Available online at http://yann.lecun.com/exdb/mnist/
Available online at http://www-labs.iro.umontreal.ca/~lisa/icml2007data/mnist_rotation.zip

References

Carreira-Perpinan MA, Hinton GE (2005) On contrastive divergence learning. In: Aistats, vol 10, pp 33–40
Cho K, Ilin A, Raiko T (2011) Improved learning of gaussian-bernoulli restricted Boltzmann machines. Artificial Neural Networks and Machine Learning–ICANN 2011
Cho K, Raiko T, Ilin A (2010) Parallel tempering is efficient for learning restricted Boltzmann machines. In: The 2010 international joint conference on neural networks (ijcnn), pp 1–8. IEEE
Fischer A, Igel C (2012) An introduction to restricted Boltzmann machines. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Franzke B, Kosko B (2015) Using noise to speed up markov chain monte carlo estimation. Procedia Computer Science 53:113–120
Article Google Scholar
Hinton G (2010) A practical guide to training restricted Boltzmann machines. Momentum 9(1):926
Google Scholar
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Krogh A, Hertz JA (1992) A simple weight decay can improve generalization. In: Advances in neural information processing systems, pp 950–957
Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. In: Advances in neural information processing systems, pp 873–880
Lee H, Battle A, Raina R, Ng AY (2007) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808
Merino ER, Castrillejo FM, Pin JD, Prats DB (2018) Weighted contrastive divergence. arXiv:180102567
Salakhutdinov R, Hinton G (2009) Deep Boltzmann machines. In: Artificial intelligence and statistics, pp 448–455
Tieleman T (2008) Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th international conference on machine learning, pp 1064–1071. ACM
Tieleman T, Hinton G (2009) Using fast weights to improve persistent contrastive divergence. In: Proceedings of the 26th annual international conference on machine learning, pp 1033–1040. ACM

Download references

Author information

Authors and Affiliations

Department of Research and Development, Medical Ip. 806-809, Cancer Research Institute, Seoul National University Hospital, 101, Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
Prima Sanjaya
Department of Computer Engineering, Dongseo University, 47, Churye-Ro, Sasang-Gu, Busan, 47011, Republic of Korea
Dae-Ki Kang

Authors

Prima Sanjaya
View author publications
You can also search for this author in PubMed Google Scholar
Dae-Ki Kang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dae-Ki Kang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Complete derivative of BG-RBM

Energy function:

$$E\left( \textbf{v},\textbf{h}\right)_{BG} = \sum\limits_{i = 1}^{n_{v}}\frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}}-\sum\limits_{j = 1}^{n_{h}}b_{j}h_{j} -\sum\limits_{i = 1}^{n_{v}}\sum\limits_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j} $$

1.1 Definition of conditional probability P(h|v)

$$\begin{array}{@{}rcl@{}} P(\textbf{h}|\textbf{v}) &=& \frac{P(\textbf{v},\textbf{h})}{P(\textbf{h})} = \frac{\frac{1}{Z}e^{-E(\textbf{v},\textbf{h})}}{\frac{1}{Z}{\sum}_{\textbf{h}} e^{-E(\textbf{v},\textbf{h})}} = \frac{e^{-E(\textbf{v},\textbf{h})}}{{\sum}_{\textbf{h}}e^{-E(\textbf{v},\textbf{h})}} \\ & =& \frac{e^{-\left( {\sum}_{i = 1}^{n_{v}}\frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}}-{\sum}_{j = 1}^{n_{h}}b_{j}h_{j} -{\sum}_{i = 1}^{n_{v}}{\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) }}{{\sum}_{h} e^{-\left( {\sum}_{i = 1}^{n_{v}}\frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}}-{\sum}_{j = 1}^{n_{h}}b_{j}h_{j} -{\sum}_{i = 1}^{n_{v}}{\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) }} \\ & =& \frac{e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\sum}_{h} e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}} \end{array} $$

Rewritting the above equation in term of product of expert model:

$$\begin{array}{@{}rcl@{}} P(\textbf{h}|\textbf{v}) & =& \frac{{\prod}_{j} e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{j} {\sum}_{h} e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}} \\ & =& \prod\limits_{j} \frac{e^{\left( \frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}+b_{j}h_{j} \right) }}{{\sum}_{h} e^{\left( \frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}+b_{j}h_{j} \right) } } = \prod\limits_{j} P(h_{j}|\textbf{v}) \end{array} $$

For binary h_j ∈{0, 1}, P(h_j = 1) equals to:

$$P(h_{j} = 1) = \frac{e^{\left( \frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}+b_{j} \right) }}{e^{\left( \frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}+b_{j} \right) } + e^{0}} = sig\left( \frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}+b_{j}\right) $$

1.2 Definition of conditional probability P(v|h)

$$\begin{array}{@{}rcl@{}} P(\textbf{v}|\textbf{h}) & =& \frac{P(\textbf{v},\textbf{h})}{P(\textbf{v})} = \frac{\frac{1}{Z}e^{-E(\textbf{v},\textbf{h})}}{\frac{1}{Z}{\int}_{\textbf{v}} e^{-E(\textbf{v},\textbf{h})}dv} = \frac{e^{-E(\textbf{v},\textbf{h})}}{{\int}_{\textbf{v}}e^{-E(\textbf{v},\textbf{h})}dv} \\ & =& \frac{e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\int}_{\textbf{v}} e^{-{\sum}_{i = 1}^{n_{v}} \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}dv} \end{array} $$

Rewritting the above equation in term of product of expert model:

$$P(\textbf{v}|\textbf{h}) = \frac{{\prod}_{i} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{i} {\int}_{\textbf{v}} e^{- \left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}dv} $$

Simplifying the denominator:

$$\begin{array}{@{}rcl@{}} P(\textbf{v}|\textbf{h}) & = & \frac{{\prod}_{i} e^{\!-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{i} {\int}_{\textbf{v}} e^{-\left( \frac{1}{2{\sigma_{i}^{2}}}({v_{i}^{2}} - 2v_{i}a_{i} + {a_{i}^{2}})-{\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}} W_{ij}h_{j} \right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j} } dv } \\ & = & \frac{{\prod}_{i} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{i} e^{\left( \frac{-{a_{i}^{2}}}{2{\sigma_{i}^{2}}}\right)+{\sum}_{j = 1}^{n_{h}}b_{j}h_{j}} {\int}_{\textbf{v}}e^{\frac{1}{2{\sigma_{i}^{2}}}\left( -{v_{i}^{2}} + 2v_{i}a_{i} \right)+{\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}} dv} \\ & = & \frac{{\prod}_{i} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{i} e^{\left( \frac{-{a_{i}^{2}}}{2{\sigma_{i}^{2}}}\right)+{\sum}_{j = 1}^{n_{h}}b_{j}h_{j}} {\int}_{\textbf{v}}e^{\frac{1}{2{\sigma_{i}^{2}}}\left( -{v_{i}^{2}}\right)} e^{v_{i} \left( \frac{a_{i}}{{\sigma_{i}^{2}}} + {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}} W_{ij}h_{j} \right)^{2} } dv} \end{array} $$

Integrating the denominator w.r.t v:

$$\begin{array}{@{}rcl@{}} & =& \frac{{\prod}_{i} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{{\prod}_{i} e^{\left( \frac{-{a_{i}^{2}}}{2{\sigma_{i}^{2}}}\right)+{\sum}_{j = 1}^{n_{h}}b_{j}h_{j}} e^{\frac{{\sigma_{i}^{2}} \left( \frac{a_{i}}{{\sigma_{i}^{2}}} + {\sum}_{j = 1}^{n_{h}}\frac{W_{ij}}{{\sigma_{i}^{2}}}h_{j}\right)^{2} }{2}} \left( \sqrt{2{\sigma_{i}^{2}}\pi}\right)} \\ & =& \frac{{\prod}_{i} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}}}{\left( \sigma_{i}\sqrt{2\pi}\right) {\prod}_{i} e^{\frac{1}{2} \left( {\sum}_{j = 1}^{n_{h}}W_{ij}h_{j} \right)^{2}+{\sum}_{j = 1}^{n_{h}}b_{j}h_{j}+\frac{a_{i}W_{ij}h_{j}}{{\sigma_{i}^{2}}}} } \end{array} $$

Simplifying the above equation:

$$\begin{array}{@{}rcl@{}} & =& \prod\limits_{i} \frac{1}{\sigma_{i}\sqrt{2\pi}} e^{-\left( \frac{(v_{i}-a_{i})^{2}}{2{\sigma_{i}^{2}}} - {\sum}_{j = 1}^{n_{h}}\frac{v_{i}}{{\sigma_{i}^{2}}}W_{ij}h_{j}\right) + {\sum}_{j = 1}^{n_{h}}b_{j}h_{j}-\left( \frac{1}{2} \left( {\sum}_{j = 1}^{n_{h}}W_{ij}h_{j} \right)^{2}+{\sum}_{j = 1}^{n_{h}}b_{j}h_{j}+\frac{a_{i}W_{ij}h_{j}}{{\sigma_{i}^{2}}}\right) } \\ & =& \prod\limits_{i} \frac{1}{\sigma_{i}\sqrt{2\pi}}e^{-\frac{1}{2{\sigma_{i}^{2}}} \left( v_{i} - \left( a_{i}+{\sum}_{j = 1}^{n_{h}}W_{ij}h_{j}\right) \right)^{2}} \end{array} $$

The above equation is a probability density function of Gaussian distribution with mean $v_{i} - \left (a_{i}+{\sum }_{j = 1}^{n_{h}}W_{ij}h_{j}\right )$ and variance ${\sigma _{i}^{2}}$.

1.3 Derivative of log-likelihood function

$$\begin{array}{@{}rcl@{}} \frac{\partial\ln P(\textbf{v})}{\partial{W_{ij}}} & =& \frac{\partial\ln{\sum}_{\textbf{h}}e^{-E(\textbf{v},\textbf{h})} }{\partial{W_{ij}}}-\frac{\partial\ln{\sum}_{\textbf{h}}{\sum}_{\textbf{v}}e^{-E(\textbf{v},\textbf{h})} }{\partial{W_{ij}}} \\ & =& \frac{1}{{\sum}_{\textbf{h}}e^{-E(\textbf{v},\textbf{h})}} \left( {\sum}_{\textbf{h}}e^{-E(\textbf{v},\textbf{h})}.-\frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} \right) \\&&- \frac{1}{{\sum}_{\textbf{v},\textbf{h}}e^{-E(\textbf{v},\textbf{h})}} \left( {\sum}_{\textbf{v},\textbf{h}}e^{-E(\textbf{v},\textbf{h})}.-\frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} \right) \\ & =& - {\sum}_{\textbf{h}}\frac{e^{-E(\textbf{v},\textbf{h})}}{{\sum}_{\textbf{h}} e^{-E(\textbf{v},\textbf{h})}}.\left( \frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} \right) \\&&+ {\sum}_{\textbf{v},\textbf{h}}\frac{e^{-E(\textbf{v},\textbf{h})}}{{\sum}_{\textbf{v},\textbf{h}} e^{-E(\textbf{v},\textbf{h})}}.\left( \frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} \right) \\ & =& -{\sum}_{\textbf{h}} P(\textbf{h}|\textbf{v}).\frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} + {\sum}_{\textbf{v},\textbf{h}} P(\textbf{h},\textbf{v}).\frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} \end{array} $$

1.4 Derivative of log-likelihood approximation

$$\begin{array}{@{}rcl@{}} \frac{\partial\ln P(\textbf{v})}{\partial{W_{ij}}} & \simeq& -\sum\limits_{\textbf{h}} P(\textbf{h}|\textbf{v}) \frac{\partial E(\textbf{v},\textbf{h})}{\partial{W_{ij}}} + {\sum}_{\tilde{\textbf{v}}} P(\tilde{\textbf{v}}) {\sum}_{\tilde{\textbf{h}}} P(\tilde{\textbf{h}}|\tilde{\textbf{v}}) \frac{\partial E(\tilde{\textbf{v}},\tilde{\textbf{h}})}{\partial{W_{ij}}} \\ & \simeq& \sum\limits_{\textbf{h}} P(\textbf{h}|\textbf{v})\frac{v_{i}h_{j}}{\sigma^{2}} - {\sum}_{\tilde{\textbf{v}}}P(\tilde{\textbf{v}}) {\sum}_{\tilde{\textbf{h}}} P(\tilde{\textbf{h}}|\tilde{\textbf{v}})\frac{\tilde{v_{i}}\tilde{h_{j}}}{\sigma^{2}} \\ & \simeq& \sum\limits_{\textbf{h}} P(h=+ 1|\textbf{v})\frac{v_{i}}{\sigma^{2}}-{\sum}_{\tilde{\textbf{v}}}P(\tilde{\textbf{v}}){\sum}_{\tilde{\textbf{h}}}P(\tilde{h} = + 1|\tilde{\textbf{v}})\frac{\tilde{v_{i}}}{\sigma^{2}} \end{array} $$

By applying this procedure to all biases, we obtain the gradients as follows:

$$\begin{array}{@{}rcl@{}} \frac{\partial\ln P(\textbf{v})}{\partial{W_{ij}}} & \simeq& \frac{v_{i} h_{j} - \tilde{v}_{i} \tilde{h}_{j}}{\sigma^{2}} \end{array} $$

$$\begin{array}{@{}rcl@{}} \frac{\partial\ln P(\textbf{v})}{\partial{a_{i}}} & \simeq& \frac{v_{i} - \tilde{v}_{i} }{\sigma^{2}} \\ \frac{\partial\ln P(\textbf{v})}{\partial{b_{j}}} & \simeq& h_{j} - \tilde{h}_{j} \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sanjaya, P., Kang, DK. Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation. Appl Intell 49, 2723–2734 (2019). https://doi.org/10.1007/s10489-018-01400-5

Download citation

Published: 02 February 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s10489-018-01400-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Abstract

Access this article

Similar content being viewed by others

Gaussian-discrete restricted Boltzmann machine with sparse-regularized hidden layer

Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training

Mode-assisted unsupervised learning of restricted Boltzmann machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Complete derivative of BG-RBM

1.1 Definition of conditional probability P(h|v)

1.2 Definition of conditional probability P(v|h)

1.3 Derivative of log-likelihood function

1.4 Derivative of log-likelihood approximation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing restricted Boltzmann machine learning by injecting Gaussian noise to likelihood gradient approximation

Abstract

Access this article

Similar content being viewed by others

Gaussian-discrete restricted Boltzmann machine with sparse-regularized hidden layer

Finding a good initial configuration of parameters for restricted Boltzmann machine pre-training

Mode-assisted unsupervised learning of restricted Boltzmann machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Complete derivative of BG-RBM

Appendix: Complete derivative of BG-RBM

1.1 Definition of conditional probability P(h|v)

1.2 Definition of conditional probability P(v|h)

1.3 Derivative of log-likelihood function

1.4 Derivative of log-likelihood approximation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation