Multivariate linear regression with non-normal errors: a solution based on mixture models

Soffritti, Gabriele; Galimberti, Giuliano

doi:10.1007/s11222-010-9190-3

Multivariate linear regression with non-normal errors: a solution based on mixture models

Published: 17 June 2010

Volume 21, pages 523–536, (2011)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Gabriele Soffritti¹ &
Giuliano Galimberti¹

875 Accesses
26 Citations
3 Altmetric
Explore all metrics

Abstract

In some situations, the distribution of the error terms of a multivariate linear regression model may depart from normality. This problem has been addressed, for example, by specifying a different parametric distribution family for the error terms, such as multivariate skewed and/or heavy-tailed distributions. A new solution is proposed, which is obtained by modelling the error term distribution through a finite mixture of multi-dimensional Gaussian components. The multivariate linear regression model is studied under this assumption. Identifiability conditions are proved and maximum likelihood estimation of the model parameters is performed using the EM algorithm. The number of mixture components is chosen through model selection criteria; when this number is equal to one, the proposal results in the classical approach. The performances of the proposed approach are evaluated through Monte Carlo experiments and compared to the ones of other approaches. In conclusion, the results obtained from the analysis of a real dataset are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, B.F. (eds.) Second International Symposium on Information Theory, pp. 267–281. Academiai Kiado, Budapest (1973)
Google Scholar
Azzalini, A., Capitanio, A.: Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. Ser. B 61, 579–602 (1999)
Article MATH MathSciNet Google Scholar
Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry, with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65, 367–389 (2003)
Article MATH MathSciNet Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MATH MathSciNet Google Scholar
Bartolucci, F., Scaccia, L.: The use of mixtures for dealing with non-normal regression errors. Comput. Stat. Data Anal. 48, 821–834 (2005)
Article MATH MathSciNet Google Scholar
Batsidis, A., Zografos, K.: Statistical inference for location and scale of elliptically contoured models with monotone missing data. J. Stat. Plan. Inference 136, 2606–2629 (2006)
Article MATH MathSciNet Google Scholar
Batsidis, A., Zografos, K.: Multivariate linear regression model with elliptically contoured distributed errors and monotone missing dependent variables. Commun. Stat. Theory 37, 349–372 (2008)
Article MATH MathSciNet Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated classification likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22, 719–725 (2000)
Article Google Scholar
Bozdogan, H.: Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52, 345–370 (1987)
Article MATH MathSciNet Google Scholar
Bozdogan, H.: Mixture-model cluster analysis using model selection criteria and a new informational measure of complexity. In: Bozdogan, H. (ed.) Proceedings of the First US/Japan Conference on the Frontiers of Statistical Modelling: an Informational Approach, pp. 69–113. Kluwer Academic, Boston (1994)
Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28, 781–793 (1995)
Article Google Scholar
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–22 (1977)
MATH MathSciNet Google Scholar
DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5, 249–282 (1988)
Article MATH MathSciNet Google Scholar
Diaz-Garcia, J.A., Rojas, M.G., Leiva-Sanchez, V.: Influence diagnostics for elliptical multivariate linear regression models. Commun. Stat. Theory 32, 625–642 (2003)
Article MATH Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, London (1993)
MATH Google Scholar
Fama, E.F.: The behaviour of stock market prices. J. Bus. 38, 34–105 (1965)
Article Google Scholar
Ferreira, J.T.A.S., Steel, M.F.J.: Bayesian multivariate regression analysis with a new class of skewed distributions. Research Report 419, Department of Statistics, University of Warwick (2003)
Ferreira, J.T.A.S., Steel, M.F.J.: Bayesian multivariate skewed regression modeling with an application to firm size. In: Genton, M.G. (ed.) Skew-Elliptical Distributions and Their Applications: a Journey Beyond Normality, pp. 174–189. CRC Chapman & Hall, Boca Raton (2004)
Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1998)
Article MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Article MATH MathSciNet Google Scholar
Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering. J. Classif. 20, 263–286 (2003)
Article MATH MathSciNet Google Scholar
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: normal mixture modeling and model-based clustering. Technical Report No. 504, Department of Statistics, University of Washington (2006)
Galea, M., Paula, G.A., Bolfarine, H.: Local influence in elliptical linear regression models. Statistician 46, 71–79 (1997)
Google Scholar
Galimberti, G., Soffritti, G.: Model-based methods to identify multiple cluster structures in a data set. Comput. Stat. Data Anal. 52, 520–532 (2007)
Article MATH MathSciNet Google Scholar
Grün, B., Leisch, F.: FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28 (2008a). URL http://www.jstatsoft.org/v26/i04/
Grün, B., Leisch, F.: Finite mixtures of generalized linear regression models. In: Shalabh, Heumann, C. (eds.) Recent Advances in Linear Models and Related Areas, pp. 205–230. Physica Verlag, Heidelberg (2008b)
Chapter Google Scholar
Hennig, C.: Identifiability of models for clusterwise linear regression. J. Classif. 17, 273–296 (2000)
Article MATH MathSciNet Google Scholar
Hennig, C.: Fixed point clusters for linear regression: computation and comparison. J. Classif. 19, 249–276 (2002)
Article MATH MathSciNet Google Scholar
Hosmer, D.W. Jr.: Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun. Stat. Simul. 3, 995–1006 (1974)
Article Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11 (2004). URL http://www.jstatsoft.org/v11/i08
Liu, C.: Bayesian robust multivariate linear regression with incomplete data. J. Am. Stat. Assoc. 91, 1219–1227 (1996)
Article MATH Google Scholar
Liu, S.: Local influence in multivariate elliptical linear regression models. Linear Algebra Appl. 354, 159–174 (2002)
Article MATH MathSciNet Google Scholar
Looney, S.W., Gulledge, T.R.: Use of the correlation coefficient with normal probability plots. Am. Stat 39, 75–79 (1985)
Article Google Scholar
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53, 3872–3882 (2009a)
Article MATH MathSciNet Google Scholar
Maugis, C., Celeux, G., Martin-Magniette, M.-L.: Variable selection for clustering with Gaussian mixture models. Biometrics 65, 701–709 (2009b)
Article MATH MathSciNet Google Scholar
McColl, J.H.: Multivariate Probability. Arnold, London (2004)
MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, Chichester (2008)
Book MATH Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, Chichester (2000)
Book MATH Google Scholar
R Development Core Team: R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria (2008). URL http://www.R-project.org
Raftery, A.E., Dean, N.: Variable selection for model-based cluster analysis. J. Am. Stat. Assoc. 101, 168–178 (2006)
Article MATH MathSciNet Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Statist. Assoc. 66, 846–850 (1971)
Article Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Seidel, W., Mosler, K., Alker, M.: A cautionary note on likelihood ratio tests in mixture models. Ann. Inst. Stat. Math 52, 481–487 (2000)
Article MATH MathSciNet Google Scholar
Srivastava, M.S.: Methods of Multivariate Statistics. Wiley, New York (2002)
MATH Google Scholar
Steele, R.J., Raftery, A.E.: Performance of Bayesian model selection criteria for Gaussian mixture models. Technical Report No. 559, Department of Statistics, University of Washington (2009)
Sutradhar, B.C., Ali, M.M.: Estimation of the parameters of a regression model with a multivariate t error variable. Commun. Stat. Theory 15, 429–450 (1986)
Article MATH MathSciNet Google Scholar
Sutton, J.: Gibrat’s legacy. J. Econ. Lit. 35, 40–59 (1997)
Google Scholar
Wedel, M., Steenkamp, J.-B.E.M.: A clusterwise regression method for simultaneous fuzzy market structuring and benefit segmentation. J. Mark. Res. 28, 385–396 (1991)
Article Google Scholar
Yakowitz, S.J., Spragins, J.D.: On the identifiability of finite mixtures. Ann. Math. Stat. 39, 209–214 (1968)
Article MATH MathSciNet Google Scholar
Zellner, A.: Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms. J. Am. Stat. Assoc. 71, 400–405 (1976)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Bologna, via delle Belle Arti 41, 40126, Bologna, Italy
Gabriele Soffritti & Giuliano Galimberti

Authors

Gabriele Soffritti
View author publications
You can also search for this author in PubMed Google Scholar
Giuliano Galimberti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Soffritti.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 119 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soffritti, G., Galimberti, G. Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21, 523–536 (2011). https://doi.org/10.1007/s11222-010-9190-3

Download citation

Received: 15 December 2009
Accepted: 31 May 2010
Published: 17 June 2010
Issue Date: October 2011
DOI: https://doi.org/10.1007/s11222-010-9190-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multivariate linear regression with non-normal errors: a solution based on mixture models

Abstract

Access this article

Similar content being viewed by others

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Semiparametric mixture of linear regressions with nonparametric Gaussian scale mixture errors

Finite Mixture of Linear Regression Models: An Adaptive Constrained Approach to Maximum Likelihood Estimation

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(PDF 119 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multivariate linear regression with non-normal errors: a solution based on mixture models

Abstract

Access this article

Similar content being viewed by others

Robust mixture regression modeling based on scale mixtures of skew-normal distributions

Semiparametric mixture of linear regressions with nonparametric Gaussian scale mixture errors

Finite Mixture of Linear Regression Models: An Adaptive Constrained Approach to Maximum Likelihood Estimation

References

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(PDF 119 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation