Elsevier

Neural Networks

Volume 8, Issue 9, 1995, Pages 1379-1408
Neural Networks

Invited article
Information geometry of the EM and em algorithms for neural networks

https://doi.org/10.1016/0893-6080(95)00003-8Get rights and content

Abstract

To realize an input-output relation given by noise-contaminated examples, it is effective to use a stochastic model of neural networks. When the model network includes hidden units whose activation values are not specified nor observed, it is useful to estimate the hidden variables from the observed or specified input-output data based on the stochastic model. Two algorithms, the EM and em algorithms, have so far been proposed for this purpose. The EM algorithm is an iterative statistical technique of using the conditional expectation, and the em algorithm is a geometrical one given by information geometry. The em algorithm minimizes iteratively the Kullback-Leibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by focusing on the EM and em algorithms, and proves a condition that guarantees their equivalence. Examples include: (1) stochastic multilayer perceptron, (2) mixtures of experts, and (3) normal mixture model.

References (53)

  • S. Amari

    The EM algorithm and information geometry in neural network learning

    Neural Computation

    (1995)
  • S. Amari et al.

    Differential geometry in statistical inferences

  • S. Amari et al.

    Statistical inference under multiterminal rate restrictions—a differential geometrical approach

    IEEE Transactions on Information Theory

    (1989)
  • S. Amari et al.

    Information geometry of estimating functions in semiparametric statistical models

  • S. Amari et al.

    Information geometry of Boltzmann machines

    IEEE Transactions on Neural Networks

    (1992)
  • P. Baldi et al.

    Smooth on-line learning algorithm for hidden Markov models

    Neural Computation

    (1994)
  • O.E. Barndorff-Nielsen

    Information and exponential families in statistical theory

    (1978)
  • O.E. Barndorff-Nielsen

    Parametric statistical model and likelihood

  • O.E. Barndorff-Nielsen et al.

    The role of differential geometry in statistical theory

    International Statistical Review

    (1986)
  • O.E. Barndorff-Nielsen et al.

    Approximating exponential models

    Annals of Institute of Statistical Mathematics

    (1989)
  • L.E. Baum et al.

    A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains

    Annals of Mathematical Statistics

    (1970)
  • J. Besag et al.

    Spatial statistics and Bayesian computation

    Journal of Royal Statistical Society

    (1993)
  • W. Byrne

    Alternating minimization and Boltzmann machine learning

    IEEE Transactions on Neural Networks

    (1992)
  • B. Cheng et al.

    Neural networks—a review from statistical perspectives—comments and rejoinders

    Statistical Science

    (1994)
  • N.N. Chentsov

    Statistical decision rules and optimal inference

    (1972)
  • D.R. Cox et al.

    Theoretical statistics

    (1974)
  • Cited by (257)

    • Natural Reweighted Wake–Sleep

      2022, Neural Networks
      Citation Excerpt :

      In their work the authors show conditions for the theoretical convergence of a modified version of the Wake–Sleep algorithm, identified as a variant of the geometric em algorithm. The convergence of the em and their relation to the Expectation–Maximization (EM) optimization process is known in literature and in particular has been studied by Amari (1995) and Fujiwara and Amari (1995). Notice that the algorithm by Ikeda et al. is using the exact FIM, while in the present work we are employing an estimation of the gradients and of the FIM based on the minibatch.

    View all citing articles on Scopus
    View full text