Skip to main content
Log in

Abstract

In this paper, we derive an EM algorithm for nonlinear state space models. We use it to estimate jointly the neural network weights, the model uncertainty and the noise in the data. In the E-step we apply a forward-backward Rauch-Tung-Striebel smoother to compute the network weights. For the M-step, we derive expressions to compute the model uncertainty and the measurement noise. We find that the method is intrinsically very powerful, simple and stable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society Series B, vol. 39, 1977, pp. 1–38.

    MathSciNet  MATH  Google Scholar 

  2. C.F. Chen, “The EM Algorithm to the Multiple Indicators and Multiple Causes Model via the Estimation of the Latent Variable,” Journal of the American Statistical Association, vol. 76, no.375, 1981, pp. 704–708.

    Article  MATH  Google Scholar 

  3. M.W. Watson and R.F. Engle, “Alternative Algorithms for the Estimation of Dynamic Factor, MIMIC and Varying Coefficient regression Models,” Journal of Econometric, vol. 23, no.3, 1983, pp. 385–400.

    Article  MATH  Google Scholar 

  4. R.H. Shumway and D.S. Stoffer, “An Approach to Time Series Smoothing and Forecasting Using the EM Algorithm,” Journal of Time Series Analysis, vol. 3, no.4, 1982, pp. 253–264.

    Article  MATH  Google Scholar 

  5. R.H. Shumway and D.S. Stoffer, “Dynamic Linear Models with Swithcing,” Journal of the American Statistical Association, vol. 86, no.415, 1991, pp. 763–769.

    Article  MathSciNet  Google Scholar 

  6. V. Digalakis, J.R. Rohlicek, and M. Ostendorf, “ML Estimation of a Stochastic Linear System with the EM Algorithm and its Application to Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 1, no.4, 1993, pp. 431–442.

    Article  Google Scholar 

  7. B. North and A. Blake, “Learning Dynamical Models using Expectation-Maximisation,” in International Conference on Computer Vision, Mumbai, India, 1998, pp. 384–389.

  8. R.P.N. Rao and D.H. Ballard, “Dynamic Model ofVisual Recognition Predicts Neural Response Properties in theVisual Cortex,” Neural Computation, vol. 9, no.4, 1997, pp. 721–763.

    Article  Google Scholar 

  9. Z. Ghahramani, “Learning Dynamic Bayesian Networks,” in Adaptive Processing of Temporal information, C.L. Giles and M. Gori (Eds.), Springer-Verlag. Lecture Notes in Artificial Intelligence, vol. 1387.

  10. S. Roweis and Z. Ghahramani, “A Unifying Review of Linear Gaussian Models,” Neural Computation, vol. 11, no.2, 1999, pp. 305–345.

    Article  Google Scholar 

  11. S. Singhal and L. Wu, “Training Multilayer perceptrons with the Extended Kalman Algorithm,” in Advances in Neural Information Processing Systems, vol. 1, D.S. Touretzky (Ed.), San Mateo, CA, 1988, pp. 133–140.

  12. S. Shah, F. Palmieri, and M. Datum, “Optimal Filtering Algorithms for Fast Learning in Feedforward Neural Networks,” Neural Networks, vol. 5, no.5, 1992, pp. 779–787.

    Article  Google Scholar 

  13. G.V. Puskorius and L.A. Feldkamp, “Decoupled Extended Kalman Filter Training of Feedforward Layered Networks,” in International Joint Conference on Neural Networks, Seattle, 1991, pp. 307–312.

  14. A. Gelb (Ed.), Applied Optimal Estimation, MIT Press, 1974.

  15. A.H. Jazwinski, Stochastic processes and Filtering Theory, Academic Press, 1970.

  16. H.E. Rauch, F. Tung, and C.T. Striebel, “Maximum Likelihood Estimates of Linear Dynamic Systems,” AIAA Journal, vol. 3, no.8, 1965, pp. 1445–1450.

    Article  MathSciNet  Google Scholar 

  17. J.F.G. de Freitas, M. Niranjan, and A.H. Gee, “Hierarchical bayesian-Kalman Models for Regularisation and ARD in Sequential Learning,” Technical Report CUED/F-INFENG/TR 307, Cambridge University Engineering Department, 1997.

  18. A. Graham, Knonecker Products and Matrix Calculus with Applications, Ellis Horwood Limited, 1981.

  19. C. Andrieu, J.F.G. de Freitas, and A. Doucet, “Robust Full Bayesian Learning for Neural Networks,” Technical Report CUED/F-INFENG/TR 343, Cambridge University Engineering Department, 1999.

  20. C.C. Holmes and B.K. Mallick, “Bayesian Radial Basis Functions of Variable Dimesion,” Neural Computation, vol. 10, no.5, 1998, pp. 1217–1233.

    Article  Google Scholar 

  21. D.J.C. Mackay, “A Practical bayesian Framework for Back-propagation Networks,” Neural Computation, vol. 4, no.3, 1992, pp. 448–472.

    Article  Google Scholar 

  22. R.M. Neal, Bayesian Learning for Neural Networks, Springer-Verlag, New York, 1996. Lecture Notes in Statistics, vol. 118.

    Book  MATH  Google Scholar 

  23. D. Rios Insua and P. Müller, “Feedforward Neural Networks for Nonparametric Regression,” Technical Report 98-02, Institute of Statistics and Decision Sciences, Duke University, 1998.

  24. S.J. Roberts, W.D. Penny, and D. Pillot, “Novelty, Confidence and Errors in Connectionist Systems,” in IEE Colloquium on Intelligent Sensors andFault Detection, vol. 261, 1996, pp. 10/1–10/6.

    Google Scholar 

  25. J.M. Spyers-Ashby, P. Bain, and S.J. Roberts, “A Comparison of Fast Fourier Transform (FFT) and Autoregressive (AR) Spectral Estimation Techniques for the Analysis of Tremor Data,” Journal of Neuroscience Methods, vol. 83, no.1, 1998, pp. 35–43.

    Article  Google Scholar 

  26. S.J. Roberts and W.D. Penny, “Bayesian Neural Network for Classification: How Useful is the Evidence Framwork?” Neural Networks, vol. 12, 1999, pp. 877–892.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

de Freitas, J., Niranjan, M. & Gee, A. Dynamic Learning with the EM Algorithm for Neural Networks. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 26, 119–131 (2000). https://doi.org/10.1023/A:1008103718973

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008103718973

Keywords

Navigation