Elsevier

Neural Networks

Volume 14, Issue 3, 1 April 2001, Pages 257-274
Neural Networks

Invited article
Bayesian approach for neural networks—review and case studies

https://doi.org/10.1016/S0893-6080(00)00098-8Get rights and content

Abstract

We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problem studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, non-linearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.

Introduction

In Bayesian data analysis all uncertain quantities are modeled as probability distributions, and inference is performed by constructing the posterior conditional probabilities for the unobserved variables of interest, given the observed data sample and prior assumptions. Good references for Bayesian data analysis are Berger, 1985, Bernardo and Smith, 1994, Gelman et al., 1995.

For neural networks, the Bayesian approach was pioneered in Buntine and Weigend, 1991, MacKay, 1992, Neal, 1992, and reviewed in Bishop, 1995, MacKay, 1995, Neal, 1996. With neural networks, the main difficulty in model building is controlling the complexity of the model. It is well known that the optimal number of degrees of freedom in the model depends on the number of training samples, amount of noise in the samples and the complexity of the underlying function being estimated. With standard neural networks techniques, the means for both determining the correct model complexity and setting up a network with the desired complexity are rather crude and often computationally very expensive.

In the Bayesian approach, these issues can be handled in a natural and consistent way. The unknown degree of complexity is handled by defining vague (non-informative) priors for the hyperparameters that determine the model complexity, and the resulting model is averaged over all model complexities weighted by their posterior probability given the data sample. The model can be allowed to have different complexity in different parts of the model by grouping the parameters that are exchangeable (have identical role in the model) to have a common hyperparameter. If, in addition, it is assumed that the complexities are more probably similar, a hierarchical hyperprior can be defined for the variance of the hyperparameters between groups.

Another problem of standard neural network methods is the lack of tools for analyzing the results (confidence intervals for the results, like 10 and 90% quantiles, etc.). The Bayesian analysis yields posterior predictive distributions for any variables of interest, making the computation of confidence intervals possible.

In this contribution, we discuss the Bayesian approach in statistical modeling (Section 2), with emphasis on the role of prior knowledge in the modeling process. In Section 3 we give a short review of Bayesian MLP models and MCMC techniques for marginalization. Then we present three real world modeling problems, where we assess the performance of the Bayesian MLP models and compare the performance to standard neural networks methods and other statistical models. The application problems are: (i) a regression problem of predicting the quality of concrete in concrete manufacturing process (Section 4); (ii) approximating an inverse mapping in a tomographic image reconstruction problem (Section 5); and (iii) a classification problem of recognizing tree trunks in forest scenes (Section 6). Finally, we discuss the conclusions of our experiments in relation to other related studies on Bayesian neural networks.

Section snippets

The Bayesian approach

The key principle of Bayesian approach is to construct the posterior probability distributions for all the unknown entities in a model, given the data sample. To use the model, marginal distributions are constructed for all those entities that we are interested in, i.e. the end variables of the study. These can be the parameters in parametric models, or the predictions in (non-parametric) regression or classification tasks.

Use of the posterior probabilities requires explicit definition of the

Bayesian learning for MLP networks

In the following, we give a short overview of the Bayesian approach for neural networks. We concentrate on MLP networks and Markov chain Monte Carlo methods for computing the integrations, following the approach introduced in Neal (1992). A detailed treatment can be found in Neal (1996), which also describes the use of the flexible Bayesian modeling (FBM) software package,1 that was the main tool used in the case problems reviewed in this

Case I: regression task in concrete quality assumption

In this section, we report results of using Bayesian MLPs for regression in a concrete quality estimation problem. The goal of the project was to develop a model for predicting the quality properties of concrete, as a part of a large quality control program of the industrial partner of the project. The quality variables included, e.g. compression strengths and densities for 1, 28 and 91 days after casting, and bleeding (water extraction), spread, slump and air-%, that measure the properties of

Case II: inverse problem in electrical impedance tomography

In this section we report results on using Bayesian MLPs for solving the ill-posed inverse problem in electrical impedance tomography (EIT). The full report of the proposed approach is presented in Lampinen, Vehtari and Leinonen (1999).

The aim in EIT is to recover the internal structure of an object from surface measurements. A number of electrodes are attached to the surface of the object and current patterns are injected through the electrodes and the resulting potentials are measured. The

Case III: classification task in forest scene analysis

In this section, we report results of using the Bayesian MLP for classification of forest scenes. The objective of the project was to assess the accuracy of estimating the volumes of growing trees from digital images. To locate the tree trunks and to initialize the fitting of the trunk contour model, a classification of the image pixels to tree and non-tree classes was necessary. The main problem in the task was the large variance in the classes. The appearance of the tree trunks varies in

Discussion and conclusions

We have reviewed the Bayesian approach for neural networks, concentrating on the MLP model and MCMC approximation for computing the marginal distribution of the end variables of the study from the joint posterior of all unknown variables given the data. In three real applications, we have assessed the performance of the Bayesian approach and compared it to other methods.

The most important advantage of the Bayesian approach in the case studies was the possibility to handle the situation where

Acknowledgements

This study was partly funded by TEKES Grant 40888/97 (Project PROMISE, Applications of Probabilistic Modeling and Search) and Graduate School in Electronics, Telecommunications and Automation (GETA). The authors would like to thank H. Järvenpää for providing her expertise into the case study I, and K. Leinonen and J. Kaipio for aiding in the problem setup and providing the TV inverse method in the case study II.

References (57)

  • S. Duane et al.

    Hybrid Monte Carlo

    Physics Letters B

    (1987)
  • W.D. Penny et al.

    Bayesian neural networks for classification: how useful is the evidence framework?

    Neural Networks

    (1999)
  • D. Barber et al.

    Ensemble learning in Bayesian neural networks

  • J.O. Berger

    Statistical design theory and Bayesian analysis, Springer series in statistics

    (1985)
  • J.O. Berger et al.

    On the development of reference priors

  • J.M. Bernardo et al.

    Bayesian theory

    (1994)
  • C.M. Bishop

    Curvature-driven smoothing: a learning algorithm for feed-forward networks

    IEEE Transactions on Neural Networks

    (1993)
  • C.M. Bishop

    Neural networks for pattern recognition

    (1995)
  • C.M. Bishop et al.

    Regression with input-dependent noise: a Bayesian treatment

  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • S.P. Brooks et al.

    Assessing convergence of Markov chain Monte Carlo algorithms

    Statistics and Computing

    (1999)
  • W.L. Buntine et al.

    Bayesian back-propagation

    Complex Systems

    (1991)
  • J.F.G. de Freitas et al.

    Sequential Monte Carlo methods to train neural network models

    Neural Computation

    (2000)
  • T.G. Dietterich

    Approximate statistical tests for comparing supervised classification learning algorithms

    Neural Computation

    (1998)
  • S. Geisser

    The predictive sample reuse method with applications

    Journal of the American Statistical Association

    (1975)
  • A.E. Gelfand

    Model determination using sampling-based methods

  • A. Gelman

    Inference and monitoring convergence

  • A. Gelman et al.

    Bayesian data analysis. Texts in statistical science

    (1995)
  • S. Geman et al.

    Gibbs distributions and the Bayesian restoration of images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1984)
  • J. Geweke

    Bayesian treatment of the independent Student-t linear model

    Journal of Applied Econometrics

    (1993)
  • P.K. Goel et al.

    Information about hyperparameters in hierarchical models

    Journal of the American Statistical Association

    (1981)
  • W.K. Hastings

    Monte Carlo sampling methods using Markov chains and their applications

    Biometrika

    (1970)
  • D. Husmeier et al.

    Empirical evaluation of Bayesian sampling for neural classifiers

  • J. Jeffreys

    Theory of probability

    (1961)
  • M.I. Jordan et al.

    An introduction to variational methods for graphical models

  • R.E. Kass et al.

    Bayes factors

    Journal of the American Statistical Association

    (1995)
  • R.E. Kass et al.

    The selection of prior distributions by formal rules

    Journal of the American Statistical Association

    (1996)
  • Cited by (0)

    View full text