Skip to main content
Log in

Bayesian neural networks for nonlinear time series forecasting

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this article, we apply Bayesian neural networks (BNNs) to time series analysis, and propose a Monte Carlo algorithm for BNN training. In addition, we go a step further in BNN model selection by putting a prior on network connections instead of hidden units as done by other authors. This allows us to treat the selection of hidden units and the selection of input variables uniformly. The BNN model is compared to a number of competitors, such as the Box-Jenkins model, bilinear model, threshold autoregressive model, and traditional neural network model, on a number of popular and challenging data sets. Numerical results show that the BNN model has achieved a consistent improvement over the competitors in forecasting future values. Insights on how to improve the generalization ability of BNNs are revealed in many respects of our implementation, such as the selection of input variables, the specification of prior distributions, and the treatment of outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Andrieu, C., Freitas, N.D., and Doucet, A. 2000. Reversible jump MCMC Simulated Annealing for Neural Networks. Uncertainty in Artificial Intelligence (UAI2000).

  • Andrieu, C., Freitas, N.D., and Doucet, A. 2001. Robust full Bayesian learning for radial basis networks. Neural Computation 13: 2359–2407.

    Google Scholar 

  • Auestad, B. and Tj⊘stheim 1990. Identification of nonlinear time series: First order characterization and order determination. Biometrika 77: 669–687.

    Google Scholar 

  • Barnett, G., Kohn, R., and Sheather, S.J. 1996. Robust estimation of an autoregressive model using Markov chain Monte Carlo. Journal of Econometrics 74: 237–254.

    Google Scholar 

  • Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford.

    Google Scholar 

  • Box, G.E.P. and Jenkins, G.M. 1970. Time Series Analysis, Forecast and Control, Holden Day, San Francisco.

    Google Scholar 

  • Casella, G. and Berger, R.L. 2001. Statistical Inference, 2nd ed. Thomson Learning, Duxbury.

    Google Scholar 

  • Chatfield, C. 2001. Time-Series Forecasting. Chapman and Hall, London.

    Google Scholar 

  • Chen, C. and Liu, L.M. 1993. Forecasting time series with outliers. Journal of Forecasting 12: 13–35.

    Google Scholar 

  • Cybenko, G. 1989. Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems. 2: 303–314.

    Google Scholar 

  • Dension, D., Holmes, C., Mallick, B., and Smith, A.F.M. 2002. Bayesian Methods for Nonlinear Classification and Regression. Willey, New York.

    Google Scholar 

  • Faraway, J. and Chatfield, C. 1998. Time series forecasting with neural networks: A comparative study using the airline data. Appl. Statist. 47: 231–250.

    Google Scholar 

  • Fernńndez, G., Ley, E., and Steel, M.F.J. 2001. Benchmark priors for Bayesian model averaging. Journal of Econometrics 100: 381–427.

    Google Scholar 

  • Freitas, N. and Andrieu, C. 2000. Sequential Monte Carlo for Model Selection and Estimation of Neural Networks. ICASSP2000.

  • Freitas, N., Andrieu, C., H⊘jen-S⊘rensen, P., Niranjan, M., and Gee, A. 2001. Sequential Monte Carlo methods for neural networks. In: Doucet A., de Freitas N., and Gordon N. (Eds.), Sequential Monte Carlo Methods in Practice, Springer-Verlag.

  • Funahashi, K. 1989. On the approximate realization of continuous mappings by neural networks. Neural Networks 2: 183–192.

    Google Scholar 

  • Gabr, M.M. and Subba Rao, T. 1981. The estimation and prediction of subset bilinear time series models with applications. Journal of Time Series Analysis 2: 155–171.

    Google Scholar 

  • Gelman, A., Roberts, R.O. and Gilks, W.R. 1996. Efficient Metropolis jumping rules. In Bernardo J.M.: Berger, J.O., Dawid, A.P., and Smith A.F.M. (Eds.), Bayesian Statistics 5. Oxford University Press, New York.

    Google Scholar 

  • Gelamn, A. and Rubin, D.B. 1992. Inference from iterative simulation using multiple sequences (with discussion). Statist. Sci. 7: 457–472.

    Google Scholar 

  • Gerlach, G., Carter, C.K., and Kohn, R. 1999. Diagnostics for time series analysis. Journal of Time Series Analysis.

  • Geyer, C.J. 1991. Markov chain Monte Carlo maximum likelihood. In: Keramigas E.M. (Ed.), Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface pp. 153–163. Interface Foundation, Fairfax Station.

  • Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, & Machine Learning. Addison Wesley.

  • Green, P.J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732.

    MATH  Google Scholar 

  • Härdle, W. and Vieu, P. 1992. Kernel regression smoothing of time series. Journal of Time Series 13: 209–232.

    Google Scholar 

  • Hastings, W.K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109.

    Google Scholar 

  • Higdon, D., Lee, H., and Bi, Z. 2002. A Bayesian approach to characterizing uncertainty in inverse problems using coarse and fine-scale information. IEEE Transactions on Signal Processing 50: 389–399.

    Google Scholar 

  • Hill, T., O’Connor, M., and Remus, W. 1996. Neural network models for time series forecasts. Management Science 42: 1082–1092.

    Google Scholar 

  • Holland, J.H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.

    Google Scholar 

  • Holmes, C.C. and Denison, D. 2002. A Bayesian MARS classifer. Machine Learning, to appear.

  • Holmes, C.C. and Mallick, B.K. 1998. Bayesian radial basis functions of variable dimension. Neural Computation 10: 1217–1233.

    Google Scholar 

  • Hornik, K., Stinchcombe, M., and White, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2: 359–366.

    Article  Google Scholar 

  • Hukushima, K. and Nemoto, K. 1996. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 65 1604–1608.

    Google Scholar 

  • Kang, S. 1991. An investigation of the use of feedforward neural networks for forecasting. Ph.D. Dissertation, Kent State University, Kent, Ohio.

  • Liang, F., Truong, Y.K., and Wong, W.H. 2001. Automatic Bayesian model averaging for linear regression and applications in Bayesian curve fitting. Statistica Sinica 11: 1005–1029.

    Google Scholar 

  • Liang, F. and Wong, W.H. 2000. Evolutionary Monte Carlo: Applications to C p model sampling and change point problem. Statistica Sinica 10: 317–342.

    Google Scholar 

  • Liang, F. and Wong, W.H. 2001. Real parameter evolutionary Monte Carlo with applications in Bayesian mixture models. J. Amer. Statist. Assoc. 96: 653–666.

    Google Scholar 

  • Lim, K.S. 1987. A comparative study of various univariate time series models for Canadian lynx data. Journal of Time Series Analysis 8: 161–176.

    Google Scholar 

  • MacKay, D.J.C. 1992. A practical Bayesian framework for backprop networks. Neural Computation 4: 448–472.

    Google Scholar 

  • Mallows, C.L. 1973, Some comments on C p . Technometrics 15: 661–676.

    MATH  Google Scholar 

  • Marinari, E. and Parisi, G. 1992. Simulated tempering: A new Monte Carlo scheme. Europhysics Letters 19: 451–458.

    Google Scholar 

  • Marrs, A.D. 1998. An application of reversible-jump MCMC to multivariate spherical Gaussian mixtures. In Advances in Neural Information Processing Systems 10. Morgan Kaufmann. San Mateo, CA, pp. 577–583.

    Google Scholar 

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. 1953. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087–1091.

    Google Scholar 

  • Müller, P. and Insua, D.R. 1998. Issues in Bayesian analysis of neural network models. Neural Computation 10: 749–770.

    Google Scholar 

  • Neal, R.M. 1996. Bayesian Learning for Neural Networks. Springer-Verlag, New York.

    Google Scholar 

  • Nicholls, D.F. and Quinn, B.G. 1982. Random Coefficient Autoregressive Models: An Introduction. Springer-Verlag, New York.

    Google Scholar 

  • Park, Y.R., Murray, T.J., and Chen, C. 1996. Predicting sunspots using a layered perceptron neural network. IEEE Trans. Neural Networks 7: 501–505.

    Google Scholar 

  • Penny, W.D. and Roberts, S.J. 1999. Bayesian neural networks for classification: How useful is the evidence framework? Neural Networks, 12: 877–892.

    Google Scholar 

  • Penny, W.D. and Roberts, S.J. 2000. Bayesian methods for autoregressive models. In: Proceedings of Neural Networks for Signal Processing, Sydney, Dec. 2000.

  • Raftery, A.E., Madigan, D., and Hoeting, J.A. 1997. Bayesian model averaging for linear regression models. J. Amer. Statist. Assoc. 92: 179–191.

    Google Scholar 

  • Rumelhart, D., Hinton, G., and Williams, J. 1986. Learning internal representations by error propagation. In: Rumelhart D. and McClelland J. (Eds.), Parallel Distributed Processing. MIT Press, Cambridge, pp. 318–362.

    Google Scholar 

  • Rumelhart, D. and McClelland, J. 1986. Parallel Distributed Processing. MIT Press, Cambridge.

    Google Scholar 

  • Smith, A.F.M. and Roberts, G.O. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J. Royal Statist. Soc. B, 55: 3–23.

    Google Scholar 

  • Subba Rao, T. and Gabr, M.M. 1984. An Introduction to Bispectral Analysis and Bilinear Time Series Models. Springer, New York.

    Google Scholar 

  • Tong, H. 1990. Non-Linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford.

    Google Scholar 

  • Tong, H. and Lim, K.S. 1980. Threshold autoregression, limit cycles and cyclical data (with discussion). J. R. Statist. Soc. B 42: 245–292.

    Google Scholar 

  • Waldmeirer, M. 1961. The Sunspot Activity in the Years 1610–1960. Schultheses.

  • Weigend, A.S., Huberman, B.A., and Rumelhart, D.E. 1990. Predicting the future: A connectionist approach. Int. J. Neural Syst. 1: 193–209.

    Google Scholar 

  • Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. 1991. Generalization by weight-elimination with application to forecasting. In: Advance in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, CA., pp. 875–882.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faming Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, F. Bayesian neural networks for nonlinear time series forecasting. Stat Comput 15, 13–29 (2005). https://doi.org/10.1007/s11222-005-4786-8

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-005-4786-8

Keywords

Navigation