Abstract
There has been a substantial body of research on mixtures-of-regressions models that has developed over the past 20 years. While much of the recent literature has focused on flexible mixtures-of-regressions models, there is still considerable utility for imposing structure on the mixture components through fully parametric models. One feature of the data that is scantly addressed in mixtures of regressions is the presence of measurement error in the predictors. The limited existing research on this topic concerns the case where classical measurement error is added to the classic mixtures-of-linear-regressions model. In this paper, we consider the setting of mixtures of polynomial regressions where the predictors are subject to classical measurement error. Moreover, each component is allowed to have a different degree for the polynomial structure. We utilize a generalized expectation-maximization algorithm for performing maximum likelihood estimation. For estimating standard errors, we extend a semiparametric bootstrap routine that has been employed for mixtures of linear regressions without measurement error in the predictors. Numeric work, for practical reasons identified, is limited to estimating two-component models. We consider a likelihood ratio test for determining if there is a higher-degree polynomial term in one of the components. Model selection criteria are also highlighted as a way for determining an appropriate model. A simulation study and an application to the classic nitric oxide emissions data are provided.
Similar content being viewed by others
References
Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat. Comput. 12(2):163–174
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Blackwell M, Honaker J, King G (2017) A unified approach to measurement error and missing data: overview and applications. Sociol Methods Res 46(3):303–341
Bordes L, Delmas C, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model where one component is known. Scand J Stat 33(4):733–752
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370
Brinkman ND (1981) Ethanol fuel - a single-cylinder engine study of efficiency and exhaust emissions. In: Society of automotive engineers technical paper 810345
Burnham KP, Anderson DR (2002) Model selection and multimodal inference: a practical information-theoretic approach. Springer, New York, NY
Çakmak A, Kapusuz M, Özcan H (2018) Experimental research on emissions of an SI engine under oxygen-enriched intake air. In: International technological sciences and designs symposium. Akademiai Kiado, Turkey, pp 991–1000
Carroll RJ, Roeder K, Wasserman L (1999) Flexible parametric measurement error models. Biometrics 55(1):44–54
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability Taylor & Francis, London, pp 5–55
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793
Cheng CL, Van Ness JW (1998) Statistical regression with measurement error. Wiley, Hoboken, NJ
De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Stat Methodol) 39(1):1–38
Frisch R (1935) Statistical confluence analysis by means of complete regression systems. Econ J 45(180):741–742
Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Sringer, New York, NY
Fuller WA (1987) Measurement error models. John Wiley & Sons Inc, New York, NY
Gershenfeld N (1997) Nonliner inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24
Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Shalabh, Heumann C (eds.) Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, Germany, pp 205–230
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
Hilden DL, Parks FB (1976) A single-cylinder engine study of methanol fuel-emphasis on organic emissions. In: Society of automotive engineers technical paper 760378
Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38
Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Graph Stat 12(1):55–79
Hurvich CM, Simonoff JS, Tsai C-L (1998) Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J R Stat Soci Ser B (Stat Methodol) 60(2):271–293
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113
Jacobs R, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixture of local experts. Neural Comput 3:78–88
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of inifinitely many incidental parameters. Ann Math Stat 27(4):887–906
Kuha J (1997) Estimation by data augmentation in regression models with continuous and discrete covariates measured with error. Stat Med 16(2):189–201
Kuha J, Temple J (2003) Covariate measurement error in quadratic regression. Int Stat Rev 71(1):131–150
Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73(364):805–811
Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65(1):93–119
Lindsay BG (1983) The geometry of mixture likelihoods: a general theory. Ann Stat 11(1):86–94
Lindsay BG (1995) Mixture models: theory. In: Geometry and applications, volume 5 of NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics and the American Statistical Association
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
McLachlan G, Basford K (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York, NY
McLachlan GJ (1987) On bootstrapping the likelihood ratio test Stastistic for the number of components in a normal mixture. J R Stat Soc Ser C (Appl Stat) 36(3):318–324
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York, NY
Mengersen KL, Robert CP, Titterington DM (eds) (2011) Mixtures: estimation and applications. Wiley, West Sussex, England
Midthune D, Carroll RJ, Freedman LS, Kipnis V (2016) Measurement error models with interactions. Biostatistics 17(2):277–290
Montuelle L, Le Pennec E (2014) Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach. Electron J Stat 8(1):1661–1695
Patra RK, Sen B (2016) Estimation of a two-component mixture model with applications to multiple testing. J R Stat Soc Ser B 78(4):869–893
Punzo A (2014) Flexible mixture modelling with the polynomial gaussian cluster-weighted model. Stat Model 14(3):257–291
Richardson S, Gilks WR (1993) A Bayesian approach to measurement error problems in epidemiology using conditional independence models. Am J Epidemiol 138(6):430–442
Richardson S, Leblond L, Jaussent I, Green PJ (2002) Mixture models in measurement error problems, with reference to epidemiological studies. J R Stat Soc Ser A (Stat Soc) 165(3):549–566
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Shen Z, Levine M, Shang Z (2018) An MM algorithm for estimation of a two component semiparametric density mixture with a known component. Electron J Stat 12(1):1181–1209
Spiegelman D, Rosner B, Logan R (2000) Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J Am Stat Assoc 95(449):51–61
Stefanski LA, Carroll RJ (1985) Covariate measurement error in logistic regression. Ann Stat 13(4):1335–1351
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B (Stat Methodol) 62(4):795–809
Sugar EA, Wang C-Y, Prentice RL (2007) Logistic regression with exposure biomarkers and flexible measurement error. Biometrics 63(1):143–151
Teicher H (1963) Identifiability of finite mixtures. Ann Math Stat 34(4):1265–1269
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York, NY
Turner RT (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (Appl Stat) 49(3):371–384
Vandekerkhove P (2013) Estimation of a semiparametric mixture of regressions model. J Nonparametr Stat 25(1):181–208
Velosa J (1993) Error analysis of the vehicle exhaust emission measurement System. In: Society of Automotive Engineers Technical Paper 930393
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330
Vilca F, Balakrishnan N, Zeller CB (2014) The bivariate Sinh-elliptical distribution with applications to Birnbaum-Saunders distribution and associated regression and measurement error models. Comput Stat Data Anal 80:1–16
Wedel M, DeSarbo WS (1995) A mixture likelihood approach for generalized linear models. J Classif 12(1):21–55
Yakowitz SJ, Spragins JD (1968) On the identifiability of finite mixtures. Ann Math Stat 39(1):209–214
Yang R, Sun X, Liu Z, Zhang Y, Fu J (2021) A numerical analysis of the effects of equivalence ratio measurement accuracy on the engine efficiency and emissions at varied compression ratios. Processes 9(8):1–14
Yao W, Song W (2015) Mixtures of linear regression with measurement errors. Commun Stat Theory Methods 44(8):1602–1614
Young DS (2014) Mixtures of regressions with changepoints. Stat Comput 24(2):265–281
Young DS (2017) Handbook of regression methods. Chapman and Hall/CRC Press, Boca Raton, FL
Acknowledgements
The authors would like to acknowledge the helpful comments provided by three anonymous referees, which included highlighting relevant literature about developments in CWM methodology, critical comments about the data analysis, and suggestions about the overall structure of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fang, X., Chen, A.W. & Young, D.S. Predictors with measurement error in mixtures of polynomial regressions. Comput Stat 38, 373–401 (2023). https://doi.org/10.1007/s00180-022-01232-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01232-5