Averaged learning equations of error-function-based multilayer perceptrons

Guo, Weili; Wei, Haikun; Zhao, Junsheng; Zhang, Kanjian

doi:10.1007/s00521-014-1557-5

Averaged learning equations of error-function-based multilayer perceptrons

Original Article
Published: 21 February 2014

Volume 25, pages 825–832, (2014)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Weili Guo¹,
Haikun Wei¹,
Junsheng Zhao^1,2 &
…
Kanjian Zhang¹

280 Accesses
6 Citations
Explore all metrics

Abstract

The multilayer perceptrons (MLPs) have strange behaviors in the learning process caused by the existing singularities in the parameter space. A detailed theoretical or numerical analysis of the MLPs is difficult due to the non-integrability of the traditional log-sigmoid activation function which leads to difficulties in obtaining the averaged learning equations (ALEs). In this paper, the error function is suggested as the activation function of the MLPs. By solving the explicit expressions of two important expectations, we obtain the averaged learning equations which make it possible for further analysis of the learning dynamics in MLPs. The simulation results also indicate that the ALEs play a significant role in investigating the singular behaviors of MLPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Multilayer Perceptron Initialisation Method with Selection of Weights on the Basis of the Function Variability

Clearly defined architectures of neural networks and multilayer perceptron

Article 01 January 2017

Relations Between Entropy and Accuracy Trends in Complex Artificial Neural Networks

References

Fukumizu K, Amari S (2000) Local minima and plateaus in hierarchical structure of multilayer perceptrons. Neural Netw 13(3):317–327
Article Google Scholar
Amari S, Nagaoka H (2000) Information geometry. AMS and Oxford University Press, New York
MATH Google Scholar
Amari S, Ozeki T (2001) Differential and algebraic geometry of multilayer perceptrons. IEICE Trans Fundam Electron Commun Comput Sci E84-A:31–38
Google Scholar
Amari S, Park H, Ozeki T (2006) Singularities affect dynamics of learning in neuromanifolds. Neural Comput 18(5):1007–1065
Article MATH MathSciNet Google Scholar
Nakajima S, Watanabe S (2007) Variational Bayes solution of linear neural networks and its generalization performance. Neural Comput 19(4):1112–1153
Article MATH MathSciNet Google Scholar
Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897
MATH Google Scholar
Amari S, Ozeki T, Cousseau F, Wei H (2011) Dynamics of learning in hierarchical models—singularity and milnor attractor. In: Wang R, Gu F (eds) Advances in cognitive neurodynamics (II). Proceedings of the second international conference on cognitive neurodynamics-2009. Springer, Netherlands
Wei H, Zhang J, Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning near singularities in layered networks. Neural Comput 20(3):813–843
Article MATH MathSciNet Google Scholar
Wei H, Amari S (2008) Dynamics of learning near singularities in radial basis function networks. Neural Netw 21(7):989–1005
Article MATH Google Scholar
Rattray M, Saad D, Amari S (1998) Natural gradient descent for on-line learning. Phys Rev Lett 81(24):5461–5464
Article Google Scholar
Pascanu R, Bengio Y (2013) Revisiting natural gradient for deep networks. Technical report, http://arxiv.org/abs/1301.3584
Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Article MathSciNet Google Scholar
Saad D, Solla A (1995) Exact solution for online learning in multilayer neural networks. Phys Rev Lett 74(21):4337–4340
Article Google Scholar
Biehl M, Schwarze H (1995) Learning by on-line gradient descent. J Phys A Math Gen 28(3):643–656
Article MATH MathSciNet Google Scholar
Park H, Inoue M, Okada M (2003) Online learning dynamics of multilayer perceptrons with unidentifiable parameters. J Phys A Math Gen 36(47):11753–11764
Article MATH MathSciNet Google Scholar
Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning in multilayer perceptrons near singularities. IEEE Trans Neural Netw 19(8):1313–1328
Google Scholar
Satoh S, Nakano R (2013) Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Process Lett 38(2):99–115
Article Google Scholar

Download references

Acknowledgments

This project is supported by National Natural Science Foundation of China under Grant 61374006, Major Program of National Natural Science Foundation of China under Grant 11190015 and Research Fund for the Doctoral Program of Higher Education of China under Grant 20100092110020.

Author information

Authors and Affiliations

Key Laboratory of Measurement and Control of CSE, Ministry of Education, School of Automation, Southeast University, Nanjing, 210096, China
Weili Guo, Haikun Wei, Junsheng Zhao & Kanjian Zhang
School of Mathematics Science, Liaocheng University, Liaocheng, 252059, Shandong, China
Junsheng Zhao

Authors

Weili Guo
View author publications
You can also search for this author in PubMed Google Scholar
Haikun Wei
View author publications
You can also search for this author in PubMed Google Scholar
Junsheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Kanjian Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weili Guo.

Appendix

From Eq. (1), we have

$$y-f_0({\user2{x}})=\varepsilon \sim {{\mathcal{N}}}(0,1),$$

(26)

then

$$\frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^{+\infty}\exp\left(-\frac{1}{2}(y-f_0({\user2{x}}))^2\right)\hbox{d}y =\frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^{+\infty}\exp\left(-{\frac{\varepsilon^2}{2}}\right)\hbox{d}\varepsilon=1.$$

(27)

$P_1({\user2{s}},{\user2{v}})$ and $P_2({\user2{s}},{\user2{v}})$ can be rewritten as

$$\begin{aligned} P_1({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\phi({\user2{v}}^T{\user2{x}}) \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\times \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}\left(y-f_0({\user2{x}})\right)^2\right)}\hbox{d}y\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\phi({\user2{v}}^T{\user2{x}}) \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}, \end{aligned}$$

(28)

$$\begin{aligned} P_2({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\times \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}\left(y-f_0({\user2{x}})\right)^2\right)}\hbox{d}y\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}. \end{aligned}$$

(29)

Then we have:

$$\begin{aligned} P_2({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi ({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n+1}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}}) {\user2{x}}\exp(-\frac{1}{2}({\user2{v}}^T{\user2{x}})^2)\exp(-\frac{1}{2}\|{\user2{x}}\|^2)\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n+1}{2}}\int\limits_{-\infty}^{\infty}{\user2{x}}\phi({\user2{s}}^T{\user2{x}}) \exp(-\frac{1}{2}(\|{\user2{x}}\|^2+({\user2{v}}^T{\user2{x}})^2))\hbox{d}{\user2{x}}\\ &=\frac{1}{2\pi}\sqrt{\det\left({\user2{B}}^{-1}\right)}{\user2{A}}^{-1}{\user2{s}}, \end{aligned}$$

(30)

where

$${\user2{A}}={\bf I}_n+{\user2{v}}{\user2{v}}^T,$$

(31)

$${\user2{B}}={\user2{A}}+{\user2{s}}{\user2{s}}^T.$$

(32)

According to Sherman–Morrison formula, we have:

$${\user2{A}}^{-1}=\left({\bf I}_n+{\user2{v}}{\user2{v}}^T\right)^{-1}={\bf I}_n-\frac{{\user2{v}}{\user2{v}}^T}{1+\|{\user2{v}}\|^2},$$

(33)

$${\user2{B}}^{-1}=\left({\user2{A}}+{\user2{s}}{\user2{s}}^T\right)^{-1} =\left({\bf I}_n-\frac{{\user2{A}}^{-1}{\user2{s}}{\user2{s}}^T}{1+{\user2{s}}^T{\user2{A}}^{-1}{\user2{s}}}\right){\user2{A}}^{-1}.$$

(34)

By using Sylvester’s determinant theorem, we have:

$$\det\left({\user2{A}}^{-1}\right)=1-\frac{\|{\user2{v}}\|^2}{1+\|{\user2{v}}\|^2}=\frac{1}{1+\|{\user2{v}}\|^2},$$

(35)

$$\det\left({\user2{B}}^{-1}\right)=\frac{1}{\left(1+\|{\user2{s}}\|^2\right)\left(1+\|{\user2{v}}\|^2\right)-({\user2{s}}^T{\user2{v}})^2}.$$

(36)

According to the Leibniz integral rule, the following equation holds:

$$P_2({\user2{s}},{\user2{v}})=\frac{\partial P_1({\user2{s}},{\user2{v}})}{\partial{\user2{v}}}.$$

(37)

From (37), we can get P ₁ by integrating P ₂ respective to ${\user2{v}}$, so the expression of P ₁ is of the following equation:

$$\begin{aligned} P_1({\user2{s}},{\user2{v}})&=\int\limits_{-\infty}^{{\user2{v}}}P_2({\user2{s}},{\user2{v}})\hbox{d}{\user2{v}}\\ &=\frac{1}{2\pi}\int\limits_{-\infty}^{{\user2{v}}}\frac{{\user2{A}}^{-1}{\user2{s}}}{\sqrt{(1+\|{\user2{s}}\|^2)(1+\|{\user2{v}}\|^2)-({\user2{s}}^T{\user2{v}})^2}}\hbox{d}{\user2{v}}\\ &=\frac{1}{2\pi}\left(\arcsin\frac{{\user2{s}}^T{\user2{v}}}{\sqrt{1+\|{\user2{s}}\|^2}\sqrt{1+\|{\user2{v}}\|^2}}+C\right), \end{aligned}$$

(38)

where C is a constant.

From (28), we know that $P_1({\varvec{0}},{\varvec{0}})={\frac{1}{4}}$, then we have $C={\frac{\pi}{2}}$. Finally, we get

$$P_1({\user2{s}},{\user2{v}})=\frac{1}{2\pi}\arcsin\frac{{\user2{s}}^T{\user2{v}}}{\sqrt{1+\|{\user2{s}}\|^2}\sqrt{1+\|{\user2{v}}\|^2}}+\frac{1}{4}.$$

(39)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, W., Wei, H., Zhao, J. et al. Averaged learning equations of error-function-based multilayer perceptrons. Neural Comput & Applic 25, 825–832 (2014). https://doi.org/10.1007/s00521-014-1557-5

Download citation

Received: 27 November 2013
Accepted: 01 February 2014
Published: 21 February 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s00521-014-1557-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Averaged learning equations of error-function-based multilayer perceptrons

Abstract

Access this article

Similar content being viewed by others

A New Multilayer Perceptron Initialisation Method with Selection of Weights on the Basis of the Function Variability

Clearly defined architectures of neural networks and multilayer perceptron

Relations Between Entropy and Accuracy Trends in Complex Artificial Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Averaged learning equations of error-function-based multilayer perceptrons

Abstract

Access this article

Similar content being viewed by others

A New Multilayer Perceptron Initialisation Method with Selection of Weights on the Basis of the Function Variability

Clearly defined architectures of neural networks and multilayer perceptron

Relations Between Entropy and Accuracy Trends in Complex Artificial Neural Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation