Abstract
Linear regression models based on finite Gaussian mixtures represent a flexible tool for the analysis of linear dependencies in multivariate data. They are suitable for dealing with correlated response variables when data come from a heterogeneous population composed of two or more sub-populations, each of which is characterised by a different linear regression model. Several types of finite mixtures of linear regression models have been specified by changing the assumptions on the parameters that differentiate the sub-populations and/or the vectors of regressors that affect the response variables. They are made more flexible in the class of models defined by mixtures of seemingly unrelated Gaussian linear regressions illustrated in this paper. With these models, the researcher is enabled to use a different vector of regressors for each dependent variable. The proposed class includes parsimonious models obtained by imposing suitable constraints on the variances and covariances of the response variables in the sub-populations. Details about the model identification and maximum likelihood estimation are given. The usefulness of these models is shown through the analysis of a real dataset. Regularity conditions for the model class are illustrated and a proof is provided that, when these conditions are met, the consistency of the maximum likelihood estimator under the examined models is ensured. In addition, the behaviour of this estimator in the presence of finite samples is numerically evaluated through the analysis of simulated datasets.
Similar content being viewed by others
References
Aitkin M, Francis B, Hinde J, Darnell R (2009) Statistical modelling in R. Oxford University Press, New York
Aitkin M, Tunnicliffe Wilson G (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22:325–331
Baird IG, Quastel N (2011) Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks. Ann Assoc Am Geogr 101:337–355
Bartolucci F, Scaccia L (2005) The use of mixtures for dealing with non-normal regression errors. Comput Stat Data Anal 48:821–834
Cadavez VAP, Hennningsen A (2012) The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs. Meat Sci 92:548–553
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Chevalier JA, Kashyap AK, Rossi PE (2003) Why don’t prices rise during periods of peak demand? Evidence from scanner data. Am Econ Rev 93:15–37
Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Cham, pp 73–84
Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc B 39:1–22
De Sarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282
De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8:227–245
Ding C (2006) Using regression mixture analysis in educational research. Pract Assess Res Eval 11:1–11
Donnelly WA (1982) The regional demand for petrol in Australia. Econ Rec 58:317–327
Dyer WJ, Pleck J, McBride B (2012) Using mixture regression to identify varying effects: a demonstration with paternal incarceration. J Marriage Fam 74:1129–1148
Elhenawy M, Rakha H, Chen H (2017) An automatic traffic congestion identification algorithm based on mixture of linear regressions. In: Helfert M, Klein C, Donnellan B, Gusikhin O (eds) Smart cities, green technologies, and intelligent transport systems. Springer, Cham, pp 242–256
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97:611–631
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
Galimberti G, Scardovi E, Soffritti G (2016) Using mixtures in seemingly unrelated linear regression models with non-normal errors. Stat Comput 26:1025–1038
Giles S, Hampton P (1984) Regional production relationships during the industrialization of New Zealand, 1935–1948. Reg Sci 24:519–533
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35
Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17:273–296
Henningsen A, Hamann JD (2007) systemfit: a package for estimating systems of simultaneous equations in R. J Stat Softw 23(4):1–40
Hosmer DW (1974) Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun Stat Theory Methods 3:995–1006
Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55:1715–1725
Jones PN, McLachlan GJ (1992) Fitting finite mixture models in a regression context. Aust J Stat 34:233–240
Keshavarzi S, Ayatollahi SMT, Zare N, Pakfetrat M (2012) Application of seemingly unrelated regression in medical data with intermittently observed time-dependent covariates. Comput Math Methods Med 2012, 821643
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many nuisance parameters. Ann Math Stat 27:887–906
Lehmann EL (1999) Elements of large-sample theory. Springer, New York
Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York
Maugis C, Celeux G, Martin-Magniette M-L (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65:701–709
McDonald SE, Shin S, Corona R et al (2016) Children exposed to intimate partner violence: identifying differential effects of family environment on children’s trauma and psychopathology symptoms through regression mixture models. Child Abus Negl 58:1–11
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Griliches Z, Engle R, Intriligator MD, McFadden D (eds) Handbook of econometrics, vol 4. Elsevier, Amsterdam, pp 2111–2245
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2017) nlme: linear and nonlinear mixed effects models. R package version 3.1-131
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73:730–738
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org
Rocci R, Gattone SA, Di Mari R (2018) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif 12:235–260
Rossi PE (2012) bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5. http://CRAN.R-project.org/package=bayesm
Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Scrucca L, Fop M, Murphy TB, Raftery AE (2017) mclust5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):205–223
Soffritti G, Galimberti G (2011) Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21:523–536
Srivastava VK, Giles DEA (1987) Seemingly unrelated regression equations models. Marcel Dekker, New York
Tashman A, Frey RJ (2009) Modeling risk in arbitrage strategies using finite mixtures. Quant Finance 9:495–503
Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl Stat 49:371–384
Van Horn ML, Jaki T, Masyn K et al (2015) Evaluating differential effects using regression interactions and regression mixture models. Educ Psychol Meas 75:677–714
White EN, Hewings GJD (1982) Space-time employment modelling: some results using seemingly unrelated regression estimators. J Reg Sci 22:283–302
Yao W (2015) Label switching and its solutions for frequentist mixture models. J Stat Comput Simul 85:1000–1012
Zellner A (1962) An efficient method of estimating seemingly unrelated regression equations and testst for aggregation bias. J Am Stat Assoc 57:348–368
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Galimberti, G., Soffritti, G. Seemingly unrelated clusterwise linear regression. Adv Data Anal Classif 14, 235–260 (2020). https://doi.org/10.1007/s11634-019-00369-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-019-00369-4