Summary
A computational framework for estimation of multivariate conditional distributions is presented. It allows the forecast of the joint distribution of target variables in dependence on explaining variables. The concept can be applied to general distribution families such as stable or hyperbolic distributions. The estimation is based on the numerical minimization of the cross entropy, using the Multi-Level Single-Linkage global optimization method. Nonlinear dependencies of conditional parameters can be modeled with help of general functional approximators such as multi-layer perceptrons. In applications, the information about a complete distribution of forecasts can be used to quantify the reliability of the forecast or for decision support. This is illustrated on a case study concerning the spare parts demand forecast. The improvement of the forecast error due to using non-Gaussian distributions is presented in another case study concerning the truck sales forecast.





Similar content being viewed by others
References
Aitken, A. (1935). On least squares and linear compbinations of observations.Proceedings of the Royal Statistical Society, 55:42–48.
Bickel, P. J. and Doksum, K. A. (1977).Mathematical Statistics Holden-Day, Oakland, California.
Bishop, C. M. (1995).Neural Networks for Pattern Recognition. Clarendon Press, Oxford.
Bishop, C. M. and Nabney, I. T. (1996). Modelling conditional probability distributions for periodic variables.Neural Computation, 8:1123–1133.
Boender, C. (84).The generalized multinomial distribution: A Bayesian analysis and applications. PhD thesis, Erasmus Universiteit Rotterdam (Centrum voor Wiskunde en Informatice, Amsterdam.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.Journal of Econometrics, 31:307–327.
Eberlein, E. and Prause, K. (2000). The generalized hyperbolic model: financial derivatives and risk measures.Mathematical Finance.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation.Econometrica, 50(4):987–1007.
Fang, K.-T., Kotz, S., and Ng, K.-W. (1990).Symmetric Multivariate and Related Distributions. Chapman and Hall, New York.
Fletcher, R. (1987).Practical Methods of Optimization. Wiley, New York.
Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks.Neural Networks, 2(3):183–192.
Hrycej, T. (1997).Neurocontrol: towards an industrial control methodology. John Wiley & Sons, Inc., New York.
Kelker, D. (1970). Distribution theory of spherical distributions and a location-scale parameter.Sankhya, A, 32:419–430.
Kullback, S. (1959).Information theory and statistics. Wiley, New York.
McCullagh, P. and Neider, J. A. (1989).Generalized linear models. Chapman and Hall, New York.
Neuneier, R., Hergert, F., Finnhoff, W., and Ormoneit, D. (1994). Estimation of conditional densities: A comparison of neural network approaches.ICANN94-Proceedings of the International Conference on Artificial Neural Networks, pages 689–692.
Nix, D. and Weigend, A. (1994). Estimating the mean and variance of the target probability distribution.World Congress of Neural Networks, Lawrence Erlbaum Associates.
O’Hagan, A. (1994).Kendall’s Advanced Theory of Statistics, Volume II B: Bayesian Inference. Arnold, London.
Ormoneit, D. (1998).Probability Estimating Neural Networks. Shaker Verlag Aachen.
Parzen, E. (1962). On estimation of probability density function and mode.Ann, Math. Stat, 35:1065–1076.
Poggio, T. and Girosi, F. (1989). A theory of networks for approximation and learning. Technical Report Memo No. 1140, MIT Artificial Intelligence Laboratory.
Rachev, S. and Mittnik, S. (2000).Stable Paretian Models in Finance. Wiley &, Sons, Inc.
Redner, R. and Walker, H. (1984). Mixture densities, maximum likelihood and the em algorithm.SIAM Review, 26.
Rinnooy Kan, A. and Timmer, G. (1987). Stochastic global optimization methods, part i: Clustering methods, part ii: Multi-level methods.Mathematical Programming, 39(l):26–78.
Rumelhart, D. and McClelland, J., editors (1986).Parallel Distributed Processing, volume 1. MIT Press, Cambridge.
Schittenkopf, C., Dorffner, G., and Dockner, E. J. (2000). Forecasting timedependent conditional densities: A neural network approach.Journal of Forecasting, 19(4):355–374.
Stuart, A., Ord, K., and Arnold, S. (1999).Kendall’s Advanced Theory of Statistics, Volume II A: Classical Inference and the Linear Model. Arnold, London.
Stützle, E. A. and Hrycej, T. (2001). Forecasting of conditional distributions - an application to the spare parts demand forecast. InProc. 2001 IASTED Int. Conf. on Artificial Intelligence and Soft Computing, Cancun, Mexico.
Weigend, A. S. and Shi, S. (2000). Predicting daily probability distributions of s&p500 returns.Journal of Forecasting, 19(4):375–392.
Williams, P. M. (1996a). Using neuronal networks to model conditional multivariate densities.Neural Computation, 8:843–854.
Williams, P. M. (1996b). Using neuronal networks to model conditional multivariate densities.Neural Computation, 8:843–854.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sützle, E.A., Hrycej, T. Numerical method for estimating multivariate conditional distributions. Computational Statistics 20, 151–176 (2005). https://doi.org/10.1007/BF02736128
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF02736128