Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7070))

  • 1609 Accesses

Abstract

A multilayer perceptron comprising a single hidden layer of neurons with sigmoidal transfer functions can approximate any computable function to arbitrary accuracy. The size of the hidden layer dictates the approximation capability of the multilayer perceptron and automatically determining a suitable network size for a given data set is an interesting question. This paper considers the problem of inferring the size of multilayer perceptron networks with the MMLD model selection criterion which is based on the minimum message length principle. The two main contributions of the paper are: (1) a new model selection criterion for inference of fully-connected multilayer perceptrons in regression problems, and (2) an efficient algorithm for computing MMLD-type codelengths in mathematically challenging model classes. Empirical performance of the new algorithm is demonstrated on artificially generated and real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)

    Article  Google Scholar 

  2. Daniels, H., Kamp, B.: Application of MLP networks to bond rating and house pricing. Neural Computing and Applications 8(3), 226–234 (1999)

    Article  Google Scholar 

  3. Cardinaux, F., Sanderson, C., Marcel, S.: Comparison of MLP and GMM classifiers for face verification on XM2VTS. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 911–920. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  4. Duch, W., Adamczak, R., Grabczewski, K., Jankowski, N., Zal, G.: Medical diagnosis support using neural and machine learning methods. In: International Conference on Engineering Applications of Neural Networks (EANN 1998), Gibraltar, pp. 292–295 (1998)

    Google Scholar 

  5. Dowe, D.L.: Foreword re C. S. Wallace. The Computer Journal 51(5), 523–560 (2008)

    Article  Google Scholar 

  6. Fitzgibbon, L.J., Dowe, D.L., Allison, L.: Univariate polynomial inference by Monte Carlo message length approximation. In: Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002), pp. 147–154 (2002)

    Google Scholar 

  7. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length, 1st edn. Information Science and Statistics. Springer (2005)

    Google Scholar 

  8. Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11(2), 185–194 (1968)

    Article  MATH  Google Scholar 

  9. Wallace, C.S., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society (Series B) 49(3), 240–252 (1987)

    MathSciNet  MATH  Google Scholar 

  10. Solomonoff, R.J.: A formal theory of inductive inference. Information and Control 7(2), 1–22, 224–254 (1964)

    Google Scholar 

  11. Kolmogorov, A.N.: Three approaches to the quantitative definition of information. Problems of Information Transmission 1(1), 1–7 (1965)

    MathSciNet  Google Scholar 

  12. Chaitin, G.J.: A theory of program size formally identical to information theory. Journal of the Association for Computing Machinery 22(3), 329–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  13. Wallace, C.S., Freeman, P.R.: Single-factor analysis by minimum message length estimation. Journal of the Royal Statistical Society (Series B) 54(1), 195–209 (1992)

    Google Scholar 

  14. Wallace, C., Dowe, D.L.: MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions. In: Proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, Florida, U.S.A, pp. 529–536 (1997)

    Google Scholar 

  15. Fukumizu, K.: A regularity condition of the information matrix of a multilayer perceptron network. Neural Networks 9(5), 871–879 (1996)

    Article  Google Scholar 

  16. Makalic, E.: Minimum Message Length Inference of Artificial Neural Networks. PhD thesis, Clayton School of Information Technology, Monash University (2007)

    Google Scholar 

  17. Zador, P.: Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Transactions on Information Theory 28(2), 139–149 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  18. Stroud, A.H.: Approximate calculation of multiple integrals. Prentice-Hall (1971)

    Google Scholar 

  19. Petersen, W.P., Bernasconi, A.: Uniform sampling from an n-sphere: Isotropic method. Technical Report TR-97-06, Swiss Center for Scientific Computing, Zürich, Switzerland (1997)

    Google Scholar 

  20. Fitzgibbon, L.J., Dowe, D.L., Allison, L.: Bayesian posterior comprehension via message from Monte Carlo. In: Proc. 2nd Hawaii International Conference on Statistics and Related Fields. Springer (June 2003)

    Google Scholar 

  21. Neal, R.M.: Bayesian learning for neural networks. Lecture Notes in Statistics. Springer (August 1996)

    Google Scholar 

  22. Kullback, S., Leibler, R.A.: On information and sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  23. Scarpetta, S., Rattray, M., Saad, D.: Natural gradient matrix momentum. In: Ninth International Conference on Artificial Neural Networks (ICANN 1999), pp. 43–48 (1999)

    Google Scholar 

  24. Pearlmutter, B.A.: Fast exact multiplication by the hessian. Neural Computation 6(1), 147–160 (1994)

    Article  Google Scholar 

  25. MacKay, D.J.C.: Comparison of approximate methods for handling hyperparameters. Neural Computation 11(5), 1035–1068 (1999)

    Article  Google Scholar 

  26. Makalic, E., Schmidt, D.F.: Minimum message length shrinkage estimation. Statistics & Probability Letters 79(9), 1155–1161 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lendasse, A., Simon, G., Wertz, V., Verleysen, M.: Fast bootstrap methodology for regression model selection. Neurocomputing 64(2), 537–541 (2005)

    Google Scholar 

  28. Alippi, C.: FPE-based criteria to dimension feedforward neural topologies. IEEE Transactions on Circuits and Systems – I: Fundamental Theory and Applications 56(8), 962–973 (1999)

    Article  Google Scholar 

  29. Friedman, J.H.: Multivariate adaptive regression splines. The Annals of Statistics 19(1), 1–67 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  30. Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6), 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  31. Hurvich, C.M., Tsai, C.L.: Model selection for extended quasi-likelihood models in small samples. Biometrics 51(3), 1077–1084 (1995)

    Article  MATH  Google Scholar 

  32. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  33. Small, M., Tse, C.K.: Minimum description length neural networks for time series prediction. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 66(6), 066701 (2002)

    Google Scholar 

  34. Murata, N., Yoshizawa, S., Ichi Amari, S.: Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks 5(6), 865–872 (1994)

    Article  Google Scholar 

  35. Nelson, W.: Analysis of performance-degradation data. IEEE Transactions on Reliability 2(2), 149–155 (1981)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Makalic, E., Allison, L. (2013). MMLD Inference of Multilayer Perceptrons. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44958-1_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44957-4

  • Online ISBN: 978-3-642-44958-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics