Skip to main content
Log in

Averaged learning equations of error-function-based multilayer perceptrons

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The multilayer perceptrons (MLPs) have strange behaviors in the learning process caused by the existing singularities in the parameter space. A detailed theoretical or numerical analysis of the MLPs is difficult due to the non-integrability of the traditional log-sigmoid activation function which leads to difficulties in obtaining the averaged learning equations (ALEs). In this paper, the error function is suggested as the activation function of the MLPs. By solving the explicit expressions of two important expectations, we obtain the averaged learning equations which make it possible for further analysis of the learning dynamics in MLPs. The simulation results also indicate that the ALEs play a significant role in investigating the singular behaviors of MLPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Fukumizu K, Amari S (2000) Local minima and plateaus in hierarchical structure of multilayer perceptrons. Neural Netw 13(3):317–327

    Article  Google Scholar 

  2. Amari S, Nagaoka H (2000) Information geometry. AMS and Oxford University Press, New York

    MATH  Google Scholar 

  3. Amari S, Ozeki T (2001) Differential and algebraic geometry of multilayer perceptrons. IEICE Trans Fundam Electron Commun Comput Sci E84-A:31–38

    Google Scholar 

  4. Amari S, Park H, Ozeki T (2006) Singularities affect dynamics of learning in neuromanifolds. Neural Comput 18(5):1007–1065

    Article  MATH  MathSciNet  Google Scholar 

  5. Nakajima S, Watanabe S (2007) Variational Bayes solution of linear neural networks and its generalization performance. Neural Comput 19(4):1112–1153

    Article  MATH  MathSciNet  Google Scholar 

  6. Watanabe S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897

    MATH  Google Scholar 

  7. Amari S, Ozeki T, Cousseau F, Wei H (2011) Dynamics of learning in hierarchical models—singularity and milnor attractor. In: Wang R, Gu F (eds) Advances in cognitive neurodynamics (II). Proceedings of the second international conference on cognitive neurodynamics-2009. Springer, Netherlands

  8. Wei H, Zhang J, Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning near singularities in layered networks. Neural Comput 20(3):813–843

    Article  MATH  MathSciNet  Google Scholar 

  9. Wei H, Amari S (2008) Dynamics of learning near singularities in radial basis function networks. Neural Netw 21(7):989–1005

    Article  MATH  Google Scholar 

  10. Rattray M, Saad D, Amari S (1998) Natural gradient descent for on-line learning. Phys Rev Lett 81(24):5461–5464

    Article  Google Scholar 

  11. Pascanu R, Bengio Y (2013) Revisiting natural gradient for deep networks. Technical report, http://arxiv.org/abs/1301.3584

  12. Amari S (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276

    Article  MathSciNet  Google Scholar 

  13. Saad D, Solla A (1995) Exact solution for online learning in multilayer neural networks. Phys Rev Lett 74(21):4337–4340

    Article  Google Scholar 

  14. Biehl M, Schwarze H (1995) Learning by on-line gradient descent. J Phys A Math Gen 28(3):643–656

    Article  MATH  MathSciNet  Google Scholar 

  15. Park H, Inoue M, Okada M (2003) Online learning dynamics of multilayer perceptrons with unidentifiable parameters. J Phys A Math Gen 36(47):11753–11764

    Article  MATH  MathSciNet  Google Scholar 

  16. Cousseau F, Ozeki T, Amari S (2008) Dynamics of learning in multilayer perceptrons near singularities. IEEE Trans Neural Netw 19(8):1313–1328

    Google Scholar 

  17. Satoh S, Nakano R (2013) Fast and stable learning utilizing singular regions of multilayer perceptron. Neural Process Lett 38(2):99–115

    Article  Google Scholar 

Download references

Acknowledgments

This project is supported by National Natural Science Foundation of China under Grant 61374006, Major Program of National Natural Science Foundation of China under Grant 11190015 and Research Fund for the Doctoral Program of Higher Education of China under Grant 20100092110020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weili Guo.

Appendix

Appendix

From Eq. (1), we have

$$y-f_0({\user2{x}})=\varepsilon \sim {{\mathcal{N}}}(0,1),$$
(26)

then

$$\frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^{+\infty}\exp\left(-\frac{1}{2}(y-f_0({\user2{x}}))^2\right)\hbox{d}y =\frac{1}{\sqrt{2\pi}}\int\limits_{-\infty}^{+\infty}\exp\left(-{\frac{\varepsilon^2}{2}}\right)\hbox{d}\varepsilon=1.$$
(27)

\(P_1({\user2{s}},{\user2{v}})\) and \(P_2({\user2{s}},{\user2{v}})\) can be rewritten as

$$\begin{aligned} P_1({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\phi({\user2{v}}^T{\user2{x}}) \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\times \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}\left(y-f_0({\user2{x}})\right)^2\right)}\hbox{d}y\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\phi({\user2{v}}^T{\user2{x}}) \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}, \end{aligned}$$
(28)
$$\begin{aligned} P_2({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\times \frac{1}{\sqrt{2\pi}}\exp{\left(-\frac{1}{2}\left(y-f_0({\user2{x}})\right)^2\right)}\hbox{d}y\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}. \end{aligned}$$
(29)

Then we have:

$$\begin{aligned} P_2({\user2{s}},{\user2{v}})&=\left(2\pi\right)^{-\frac{n}{2}}\int\limits_{-\infty}^{\infty}\phi ({\user2{s}}^T{\user2{x}})\frac{\partial\phi({\user2{v}}^T{\user2{x}})}{\partial{\user2{v}}} \exp{(-\frac{1}{2}\|{\user2{x}}\|^2)}\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n+1}{2}}\int\limits_{-\infty}^{\infty}\phi({\user2{s}}^T{\user2{x}}) {\user2{x}}\exp(-\frac{1}{2}({\user2{v}}^T{\user2{x}})^2)\exp(-\frac{1}{2}\|{\user2{x}}\|^2)\hbox{d}{\user2{x}}\\ &=\left(2\pi\right)^{-\frac{n+1}{2}}\int\limits_{-\infty}^{\infty}{\user2{x}}\phi({\user2{s}}^T{\user2{x}}) \exp(-\frac{1}{2}(\|{\user2{x}}\|^2+({\user2{v}}^T{\user2{x}})^2))\hbox{d}{\user2{x}}\\ &=\frac{1}{2\pi}\sqrt{\det\left({\user2{B}}^{-1}\right)}{\user2{A}}^{-1}{\user2{s}}, \end{aligned}$$
(30)

where

$${\user2{A}}={\bf I}_n+{\user2{v}}{\user2{v}}^T,$$
(31)
$${\user2{B}}={\user2{A}}+{\user2{s}}{\user2{s}}^T.$$
(32)

According to Sherman–Morrison formula, we have:

$${\user2{A}}^{-1}=\left({\bf I}_n+{\user2{v}}{\user2{v}}^T\right)^{-1}={\bf I}_n-\frac{{\user2{v}}{\user2{v}}^T}{1+\|{\user2{v}}\|^2},$$
(33)
$${\user2{B}}^{-1}=\left({\user2{A}}+{\user2{s}}{\user2{s}}^T\right)^{-1} =\left({\bf I}_n-\frac{{\user2{A}}^{-1}{\user2{s}}{\user2{s}}^T}{1+{\user2{s}}^T{\user2{A}}^{-1}{\user2{s}}}\right){\user2{A}}^{-1}.$$
(34)

By using Sylvester’s determinant theorem, we have:

$$\det\left({\user2{A}}^{-1}\right)=1-\frac{\|{\user2{v}}\|^2}{1+\|{\user2{v}}\|^2}=\frac{1}{1+\|{\user2{v}}\|^2},$$
(35)
$$\det\left({\user2{B}}^{-1}\right)=\frac{1}{\left(1+\|{\user2{s}}\|^2\right)\left(1+\|{\user2{v}}\|^2\right)-({\user2{s}}^T{\user2{v}})^2}.$$
(36)

According to the Leibniz integral rule, the following equation holds:

$$P_2({\user2{s}},{\user2{v}})=\frac{\partial P_1({\user2{s}},{\user2{v}})}{\partial{\user2{v}}}.$$
(37)

From (37), we can get P 1 by integrating P 2 respective to \({\user2{v}}\), so the expression of P 1 is of the following equation:

$$\begin{aligned} P_1({\user2{s}},{\user2{v}})&=\int\limits_{-\infty}^{{\user2{v}}}P_2({\user2{s}},{\user2{v}})\hbox{d}{\user2{v}}\\ &=\frac{1}{2\pi}\int\limits_{-\infty}^{{\user2{v}}}\frac{{\user2{A}}^{-1}{\user2{s}}}{\sqrt{(1+\|{\user2{s}}\|^2)(1+\|{\user2{v}}\|^2)-({\user2{s}}^T{\user2{v}})^2}}\hbox{d}{\user2{v}}\\ &=\frac{1}{2\pi}\left(\arcsin\frac{{\user2{s}}^T{\user2{v}}}{\sqrt{1+\|{\user2{s}}\|^2}\sqrt{1+\|{\user2{v}}\|^2}}+C\right), \end{aligned}$$
(38)

where C is a constant.

From (28), we know that \(P_1({\varvec{0}},{\varvec{0}})={\frac{1}{4}}\), then we have \(C={\frac{\pi}{2}}\). Finally, we get

$$P_1({\user2{s}},{\user2{v}})=\frac{1}{2\pi}\arcsin\frac{{\user2{s}}^T{\user2{v}}}{\sqrt{1+\|{\user2{s}}\|^2}\sqrt{1+\|{\user2{v}}\|^2}}+\frac{1}{4}.$$
(39)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, W., Wei, H., Zhao, J. et al. Averaged learning equations of error-function-based multilayer perceptrons. Neural Comput & Applic 25, 825–832 (2014). https://doi.org/10.1007/s00521-014-1557-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-014-1557-5

Keywords

Navigation