Abstract
Consider data (x 1,y 1),...,(x n,y n), where each x i may be vector valued, and the distribution of y i given x i is a mixture of linear regressions. This provides a generalization of mixture models which do not include covariates in the mixture formulation. This mixture of linear regressions formulation has appeared in the computer science literature under the name “Hierarchical Mixtures of Experts” model.
This model has been considered from both frequentist and Bayesian viewpoints. We focus on the Bayesian formulation. Previously, estimation of the mixture of linear regression model has been done through straightforward Gibbs sampling with latent variables. This paper contributes to this field in three major areas. First, we provide a theoretical underpinning to the Bayesian implementation by demonstrating consistency of the posterior distribution. This demonstration is done by extending results in Barron, Schervish and Wasserman (Annals of Statistics 27: 536–561, 1999) on bracketing entropy to the regression setting. Second, we demonstrate through examples that straightforward Gibbs sampling may fail to effectively explore the posterior distribution and provide alternative algorithms that are more accurate. Third, we demonstrate the usefulness of the mixture of linear regressions framework in Bayesian robust regression. The methods described in the paper are applied to two examples.
Similar content being viewed by others
References
Barron A., Schervish M., and Wasserman L. 1999. The consistency of posterior distributions in nonparametric problems. Annals of Statistics 27: 536-561.
Billingsley P. 1986. Probability and Measure, 2nd Edn. John Wiley and Sons.
Celeux G., Hurn M., and Robert C. 2000. Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association 95: 957-970.
Cohen E. 1980. Inharmonic tone perception. Ph.D Dissertation, Stanford University.
Dacunha-Castelle D. and Gassiat E. 1997. Testing in locally conic models, and application to mixture models. ESAIM Probability and Statistics 1: 285-317.
Davenport J., Bezdek J., and Hathaway R. 1988. Parameter estimation for finite mixture distributions. Comput. Math. Applic. 15: 819-828.
Deibolt J. and Robert C. 1994. Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society B 56: 363-375.
DeVeaux R. 1989. Mixtures of linear regressions. Computational Statistics and Data Analysis 8: 227-245.
Feng Z. and McCulloch C. 1994. On the likelihood ratio test statistic for the number of components in a normal mixture with unequal variances. Biometrics 50: 1158-1169.
Gelman A., Carlin J., Stern H., and Rubin D. 1995. Bayesian Data Analysis. Chapman and Hall.
Genovese C. and Wasserman L. 2000. Rates of convergence for the Gaussian mixture sieve. Annals of Statistics 28: 1105-1127.
Hawkins D., Bradu D., and Kass G. 1984. Location of several outliers in multiple regression using elemental sets. Technometrics 26: 197-208.
Hurn M., Justel A., and Robert C. 2000. Estimating mixtures of regressions. Technical Report, Department of Mathematics, University of Bath.
Jiang W. and Tanner M. 1999. Hierarchical mixtures of experts for exponential family regression models: Approximation and maximum likelihood estimation. Annals of Statistics 27: 987-1011.
Jordan M. and Jacobs R. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Computation 6: 181-214.
Justel A. and Pena D. 1996. Gibbs sampling will fail in outlier problems with strong masking. Journal of Computational and Graphical Statistics 5: 176-189.
Kass R. and Raftery A. 1995. Bayes factors. Journal of the American Statistical Association 90: 773-795.
Kerebin C. 1998. Consistent estimation of the order of mixture models. Technical Report. Laboratoire Analyse et Probabilite, Universite d'Evry-Val d'Essonne.
Mengerson K. and Robert C. 1996. Testing for mixtures: A Bayesian entropic approach. In: Bernardo J., Berger J., Dawid P., and Smith A. (Eds.), Bayesian Statistics 5. Oxford University Press, pp. 255-276.
Muller P., Erkanli A., and West M. 1996. Bayesian curve fitting using multivariate normal mixtures. Biometrika 83: 67-79.
Peng F., Jacobs R., and Tanner M. 1996. Bayesian inference in mixtures of experts and hierarchical mixtures of experts models with an application to speech recognition. Journal of the American Statistical Association 91: 953-960.
Richardson S. and Green P. 1997. On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society B 59: 731-792.
Robert C. 1996. Mixtures of distributions: Inference and estimation. In: Gilks W., Richardson S., and Spiegelhalter D. (Eds.), Practical Markov Chain Monte Carlo. Chapman and Hall, Ch. 24.
Roeder K. and Wasserman L. 1997. Practical Bayesian density estimation using mixtures of normals. Journal of the American Statistical Association 92: 894-902.
Rousseeuw P. 1984. Least median of squares regression. Journal of the American Statistical Association 79: 871-880.
Rousseeuw P. and van Zomeren B. 1990. Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association 85: 633-651.
Stephens M. 2000. Bayesian analysis of mixture models with an unknown number of components—An alternative to reversible jump methods. Annals of Statistics 28: 40-74.
Tierney L. 1994. Markov Chains for exploring posterior distributions (with discussion). Annals of Statistics 22: 1701-1762.
Verdinelli I. and Wasserman L. 1991. Bayesian analysis of outlier problems using the Gibbs sampler. Statistical Computing 1: 105-117.
Wasserman L. 2000. Asymptotic inference for mixture models using data dependent priors. Journal of the Royal Statistical Society B 62: 159-180.
Waterhouse S., Mackay D., and Robinson T. 1996. Bayesian methods for mixtures of experts. In: Touretzky D., Mozer M., and Hasselmo M. (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MIT Press.
Weisberg S. 1985. Applied Linear Regression. John Wiley and Sons.
Rights and permissions
About this article
Cite this article
Viele, K., Tong, B. Modeling with Mixtures of Linear Regressions. Statistics and Computing 12, 315–330 (2002). https://doi.org/10.1023/A:1020779827503
Issue Date:
DOI: https://doi.org/10.1023/A:1020779827503