Seemingly unrelated clusterwise linear regression

Galimberti, Giuliano; Soffritti, Gabriele

doi:10.1007/s11634-019-00369-4

Seemingly unrelated clusterwise linear regression

Regular Article
Published: 12 August 2019

Volume 14, pages 235–260, (2020)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

464 Accesses
4 Citations
Explore all metrics

Abstract

Linear regression models based on finite Gaussian mixtures represent a flexible tool for the analysis of linear dependencies in multivariate data. They are suitable for dealing with correlated response variables when data come from a heterogeneous population composed of two or more sub-populations, each of which is characterised by a different linear regression model. Several types of finite mixtures of linear regression models have been specified by changing the assumptions on the parameters that differentiate the sub-populations and/or the vectors of regressors that affect the response variables. They are made more flexible in the class of models defined by mixtures of seemingly unrelated Gaussian linear regressions illustrated in this paper. With these models, the researcher is enabled to use a different vector of regressors for each dependent variable. The proposed class includes parsimonious models obtained by imposing suitable constraints on the variances and covariances of the response variables in the sub-populations. Details about the model identification and maximum likelihood estimation are given. The usefulness of these models is shown through the analysis of a real dataset. Regularity conditions for the model class are illustrated and a proof is provided that, when these conditions are met, the consistency of the maximum likelihood estimator under the examined models is ensured. In addition, the behaviour of this estimator in the presence of finite samples is numerically evaluated through the analysis of simulated datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Jörg Henseler, Christian M. Ringle & Marko Sarstedt

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

Rémi Thériault, Mattan S. Ben-Shachar, … Dominique Makowski

References

Aitkin M, Francis B, Hinde J, Darnell R (2009) Statistical modelling in R. Oxford University Press, New York
MATH Google Scholar
Aitkin M, Tunnicliffe Wilson G (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22:325–331
MATH Google Scholar
Baird IG, Quastel N (2011) Dolphin-safe tuna from California to Thailand: localisms in environmental certification of global commodity networks. Ann Assoc Am Geogr 101:337–355
Google Scholar
Bartolucci F, Scaccia L (2005) The use of mixtures for dealing with non-normal regression errors. Comput Stat Data Anal 48:821–834
MathSciNet MATH Google Scholar
Cadavez VAP, Hennningsen A (2012) The use of seemingly unrelated regression (SUR) to predict the carcass composition of lambs. Meat Sci 92:548–553
Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793
Google Scholar
Chevalier JA, Kashyap AK, Rossi PE (2003) Why don’t prices rise during periods of peak demand? Evidence from scanner data. Am Econ Rev 93:15–37
Google Scholar
Dang UJ, McNicholas PD (2015) Families of parsimonious finite mixtures of regression models. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Cham, pp 73–84
Google Scholar
Day NE (1969) Estimating the components of a mixture of normal distributions. Biometrika 56:463–474
MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc B 39:1–22
MathSciNet MATH Google Scholar
De Sarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282
MathSciNet MATH Google Scholar
De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8:227–245
MathSciNet MATH Google Scholar
Ding C (2006) Using regression mixture analysis in educational research. Pract Assess Res Eval 11:1–11
Google Scholar
Donnelly WA (1982) The regional demand for petrol in Australia. Econ Rec 58:317–327
Google Scholar
Dyer WJ, Pleck J, McBride B (2012) Using mixture regression to identify varying effects: a demonstration with paternal incarceration. J Marriage Fam 74:1129–1148
Google Scholar
Elhenawy M, Rakha H, Chen H (2017) An automatic traffic congestion identification algorithm based on mixture of linear regressions. In: Helfert M, Klein C, Donnellan B, Gusikhin O (eds) Smart cities, green technologies, and intelligent transport systems. Springer, Cham, pp 242–256
Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97:611–631
MathSciNet MATH Google Scholar
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer, New York
MATH Google Scholar
Galimberti G, Scardovi E, Soffritti G (2016) Using mixtures in seemingly unrelated linear regression models with non-normal errors. Stat Comput 26:1025–1038
MathSciNet MATH Google Scholar
Giles S, Hampton P (1984) Regional production relationships during the industrialization of New Zealand, 1935–1948. Reg Sci 24:519–533
Google Scholar
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28(4):1–35
Google Scholar
Hennig C (2000) Identifiability of models for clusterwise linear regression. J Classif 17:273–296
MathSciNet MATH Google Scholar
Henningsen A, Hamann JD (2007) systemfit: a package for estimating systems of simultaneous equations in R. J Stat Softw 23(4):1–40
Google Scholar
Hosmer DW (1974) Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun Stat Theory Methods 3:995–1006
MATH Google Scholar
Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55:1715–1725
MathSciNet MATH Google Scholar
Jones PN, McLachlan GJ (1992) Fitting finite mixture models in a regression context. Aust J Stat 34:233–240
Google Scholar
Keshavarzi S, Ayatollahi SMT, Zare N, Pakfetrat M (2012) Application of seemingly unrelated regression in medical data with intermittently observed time-dependent covariates. Comput Math Methods Med 2012, 821643
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many nuisance parameters. Ann Math Stat 27:887–906
MATH Google Scholar
Lehmann EL (1999) Elements of large-sample theory. Springer, New York
MATH Google Scholar
Magnus JR, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New York
MATH Google Scholar
Maugis C, Celeux G, Martin-Magniette M-L (2009) Variable selection for clustering with Gaussian mixture models. Biometrics 65:701–709
MathSciNet MATH Google Scholar
McDonald SE, Shin S, Corona R et al (2016) Children exposed to intimate partner violence: identifying differential effects of family environment on children’s trauma and psychopathology symptoms through regression mixture models. Child Abus Negl 58:1–11
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
MATH Google Scholar
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. In: Griliches Z, Engle R, Intriligator MD, McFadden D (eds) Handbook of econometrics, vol 4. Elsevier, Amsterdam, pp 2111–2245
Google Scholar
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2017) nlme: linear and nonlinear mixed effects models. R package version 3.1-131
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73:730–738
MathSciNet MATH Google Scholar
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org
Rocci R, Gattone SA, Di Mari R (2018) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif 12:235–260
MathSciNet MATH Google Scholar
Rossi PE (2012) bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-5. http://CRAN.R-project.org/package=bayesm
Rossi PE, Allenby GM, McCulloch R (2005) Bayesian statistics and marketing. Wiley, Chichester
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
MathSciNet MATH Google Scholar
Scrucca L, Fop M, Murphy TB, Raftery AE (2017) mclust5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):205–223
Google Scholar
Soffritti G, Galimberti G (2011) Multivariate linear regression with non-normal errors: a solution based on mixture models. Stat Comput 21:523–536
MathSciNet MATH Google Scholar
Srivastava VK, Giles DEA (1987) Seemingly unrelated regression equations models. Marcel Dekker, New York
MATH Google Scholar
Tashman A, Frey RJ (2009) Modeling risk in arbitrage strategies using finite mixtures. Quant Finance 9:495–503
MathSciNet MATH Google Scholar
Turner TR (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl Stat 49:371–384
MathSciNet MATH Google Scholar
Van Horn ML, Jaki T, Masyn K et al (2015) Evaluating differential effects using regression interactions and regression mixture models. Educ Psychol Meas 75:677–714
Google Scholar
White EN, Hewings GJD (1982) Space-time employment modelling: some results using seemingly unrelated regression estimators. J Reg Sci 22:283–302
Google Scholar
Yao W (2015) Label switching and its solutions for frequentist mixture models. J Stat Comput Simul 85:1000–1012
MathSciNet MATH Google Scholar
Zellner A (1962) An efficient method of estimating seemingly unrelated regression equations and testst for aggregation bias. J Am Stat Assoc 57:348–368
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistical Sciences, University of Bologna, via delle Belle Arti 41, 40126, Bologna, Italy
Giuliano Galimberti & Gabriele Soffritti

Authors

Giuliano Galimberti
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Soffritti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriele Soffritti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galimberti, G., Soffritti, G. Seemingly unrelated clusterwise linear regression. Adv Data Anal Classif 14, 235–260 (2020). https://doi.org/10.1007/s11634-019-00369-4

Download citation

Received: 29 November 2018
Revised: 21 July 2019
Accepted: 05 August 2019
Published: 12 August 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11634-019-00369-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Seemingly unrelated clusterwise linear regression

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Violating the normality assumption may be the lesser of two evils

Check your outliers! An introduction to identifying statistical outliers in R with easystats

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Seemingly unrelated clusterwise linear regression

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Violating the normality assumption may be the lesser of two evils

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats