Skip to main content
Log in

Seemingly unrelated clusterwise linear regression

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Linear regression models based on finite Gaussian mixtures represent a flexible tool for the analysis of linear dependencies in multivariate data. They are suitable for dealing with correlated response variables when data come from a heterogeneous population composed of two or more sub-populations, each of which is characterised by a different linear regression model. Several types of finite mixtures of linear regression models have been specified by changing the assumptions on the parameters that differentiate the sub-populations and/or the vectors of regressors that affect the response variables. They are made more flexible in the class of models defined by mixtures of seemingly unrelated Gaussian linear regressions illustrated in this paper. With these models, the researcher is enabled to use a different vector of regressors for each dependent variable. The proposed class includes parsimonious models obtained by imposing suitable constraints on the variances and covariances of the response variables in the sub-populations. Details about the model identification and maximum likelihood estimation are given. The usefulness of these models is shown through the analysis of a real dataset. Regularity conditions for the model class are illustrated and a proof is provided that, when these conditions are met, the consistency of the maximum likelihood estimator under the examined models is ensured. In addition, the behaviour of this estimator in the presence of finite samples is numerically evaluated through the analysis of simulated datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aitkin M, Francis B, Hinde J, Darnell R (2009) Statistical modelling in R. Oxford University Press, New York

    MATH  Google Scholar 

  • Aitkin M, Tunnicliffe Wilson G (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22:325–331

    MATH  Google Scholar 

  • Baird IG, Quastel N (2011) Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks. Ann Assoc Am Geogr 101:337–355

    Google Scholar 

  • Bartolucci F, Scaccia L (2005) The use of mixtures for dealing with non-normal regression errors. Comput Stat Data Anal 48:821–834

    MathSciNet  MATH  Google Scholar 

  • Cadavez VAP, Hennningsen A (2012) The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs. Meat Sci 92:548–553

    Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

    Google Scholar 

  • Chevalier JA, Kashyap AK, Rossi PE (2003) Why don’t prices rise during periods of peak demand? Evidence from scanner data. Am Econ Rev 93:15–37

    Google Scholar 

  • Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Cham, pp 73–84

    Google Scholar 

  • Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474

    MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc B 39:1–22

    MathSciNet  MATH  Google Scholar 

  • De Sarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282

    MathSciNet  MATH  Google Scholar 

  • De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8:227–245

    MathSciNet  MATH  Google Scholar 

  • Ding C (2006) Using regression mixture analysis in educational research. Pract Assess Res Eval 11:1–11

    Google Scholar 

  • Donnelly WA (1982) The regional demand for petrol in Australia. Econ Rec 58:317–327

    Google Scholar 

  • Dyer WJ, Pleck J, McBride B (2012) Using mixture regression to identify varying effects: a demonstration with paternal incarceration. J Marriage Fam 74:1129–1148

    Google Scholar 

  • Elhenawy M, Rakha H, Chen H (2017) An automatic traffic congestion identification algorithm based on mixture of linear regressions. In: Helfert M, Klein C, Donnellan B, Gusikhin O (eds) Smart cities, green technologies, and intelligent transport systems. Springer, Cham, pp 242–256

    Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97:611–631

    MathSciNet  MATH  Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York

    MATH  Google Scholar 

  • Galimberti G, Scardovi E, Soffritti G (2016) Using mixtures in seemingly unrelated linear regression models with non-normal errors. Stat Comput 26:1025–1038

    MathSciNet  MATH  Google Scholar 

  • Giles S, Hampton P (1984) Regional production relationships during the industrialization of New Zealand, 1935–1948. Reg Sci 24:519–533

    Google Scholar 

  • Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35

    Google Scholar 

  • Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17:273–296

    MathSciNet  MATH  Google Scholar 

  • Henningsen A, Hamann JD (2007) systemfit: a package for estimating systems of simultaneous equations in R. J Stat Softw 23(4):1–40

    Google Scholar 

  • Hosmer DW (1974) Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun Stat Theory Methods 3:995–1006

    MATH  Google Scholar 

  • Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55:1715–1725

    MathSciNet  MATH  Google Scholar 

  • Jones PN, McLachlan GJ (1992) Fitting finite mixture models in a regression context. Aust J Stat 34:233–240

    Google Scholar 

  • Keshavarzi S, Ayatollahi SMT, Zare N, Pakfetrat M (2012) Application of seemingly unrelated regression in medical data with intermittently observed time-dependent covariates. Comput Math Methods Med 2012, 821643

  • Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many nuisance parameters. Ann Math Stat 27:887–906

    MATH  Google Scholar 

  • Lehmann EL (1999) Elements of large-sample theory. Springer, New York

    MATH  Google Scholar 

  • Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York

    MATH  Google Scholar 

  • Maugis C, Celeux G, Martin-Magniette M-L (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65:701–709

    MathSciNet  MATH  Google Scholar 

  • McDonald SE, Shin S, Corona R et al (2016) Children exposed to intimate partner violence: identifying differential effects of family environment on children’s trauma and psychopathology symptoms through regression mixture models. Child Abus Negl 58:1–11

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    MATH  Google Scholar 

  • Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Griliches Z, Engle R, Intriligator MD, McFadden D (eds) Handbook of econometrics, vol 4. Elsevier, Amsterdam, pp 2111–2245

    Google Scholar 

  • Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2017) nlme: linear and nonlinear mixed effects models. R package version 3.1-131

  • Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73:730–738

    MathSciNet  MATH  Google Scholar 

  • R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org

  • Rocci R, Gattone SA, Di Mari R (2018) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif 12:235–260

    MathSciNet  MATH  Google Scholar 

  • Rossi PE (2012) bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5. http://CRAN.R-project.org/package=bayesm

  • Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester

    MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MathSciNet  MATH  Google Scholar 

  • Scrucca L, Fop M, Murphy TB, Raftery AE (2017) mclust5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):205–223

    Google Scholar 

  • Soffritti G, Galimberti G (2011) Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21:523–536

    MathSciNet  MATH  Google Scholar 

  • Srivastava VK, Giles DEA (1987) Seemingly unrelated regression equations models. Marcel Dekker, New York

    MATH  Google Scholar 

  • Tashman A, Frey RJ (2009) Modeling risk in arbitrage strategies using finite mixtures. Quant Finance 9:495–503

    MathSciNet  MATH  Google Scholar 

  • Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl Stat 49:371–384

    MathSciNet  MATH  Google Scholar 

  • Van Horn ML, Jaki T, Masyn K et al (2015) Evaluating differential effects using regression interactions and regression mixture models. Educ Psychol Meas 75:677–714

    Google Scholar 

  • White EN, Hewings GJD (1982) Space-time employment modelling: some results using seemingly unrelated regression estimators. J Reg Sci 22:283–302

    Google Scholar 

  • Yao W (2015) Label switching and its solutions for frequentist mixture models. J Stat Comput Simul 85:1000–1012

    MathSciNet  MATH  Google Scholar 

  • Zellner A (1962) An efficient method of estimating seemingly unrelated regression equations and testst for aggregation bias. J Am Stat Assoc 57:348–368

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriele Soffritti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Galimberti, G., Soffritti, G. Seemingly unrelated clusterwise linear regression. Adv Data Anal Classif 14, 235–260 (2020). https://doi.org/10.1007/s11634-019-00369-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-019-00369-4

Keywords

Mathematics Subject Classification

Navigation