Predictors with measurement error in mixtures of polynomial regressions

Fang, Xiaoqiong; Chen, Andy W.; Young, Derek S.

doi:10.1007/s00180-022-01232-5

Predictors with measurement error in mixtures of polynomial regressions

Original paper
Published: 13 May 2022

Volume 38, pages 373–401, (2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

396 Accesses
1 Citation
Explore all metrics

Abstract

There has been a substantial body of research on mixtures-of-regressions models that has developed over the past 20 years. While much of the recent literature has focused on flexible mixtures-of-regressions models, there is still considerable utility for imposing structure on the mixture components through fully parametric models. One feature of the data that is scantly addressed in mixtures of regressions is the presence of measurement error in the predictors. The limited existing research on this topic concerns the case where classical measurement error is added to the classic mixtures-of-linear-regressions model. In this paper, we consider the setting of mixtures of polynomial regressions where the predictors are subject to classical measurement error. Moreover, each component is allowed to have a different degree for the polynomial structure. We utilize a generalized expectation-maximization algorithm for performing maximum likelihood estimation. For estimating standard errors, we extend a semiparametric bootstrap routine that has been employed for mixtures of linear regressions without measurement error in the predictors. Numeric work, for practical reasons identified, is limited to estimating two-component models. We consider a likelihood ratio test for determining if there is a higher-degree polynomial term in one of the components. Model selection criteria are also highlighted as a way for determining an appropriate model. A simulation study and an application to the classic nitric oxide emissions data are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixtures of regressions using matrix-variate heavy-tailed distributions

Article Open access 16 March 2024

Robust estimation of the number of components for mixtures of linear regression models

Article 04 August 2015

Robust mixture of linear mixed modeling via multivariate Laplace distribution

Article 24 March 2025

References

Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat. Comput. 12(2):163–174
MathSciNet Google Scholar
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Google Scholar
Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29
Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Google Scholar
Blackwell M, Honaker J, King G (2017) A unified approach to measurement error and missing data: overview and applications. Sociol Methods Res 46(3):303–341
MathSciNet Google Scholar
Bordes L, Delmas C, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model where one component is known. Scand J Stat 33(4):733–752
MathSciNet MATH Google Scholar
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370
MathSciNet MATH Google Scholar
Brinkman ND (1981) Ethanol fuel - a single-cylinder engine study of efficiency and exhaust emissions. In: Society of automotive engineers technical paper 810345
Burnham KP, Anderson DR (2002) Model selection and multimodal inference: a practical information-theoretic approach. Springer, New York, NY
MATH Google Scholar
Çakmak A, Kapusuz M, Özcan H (2018) Experimental research on emissions of an SI engine under oxygen-enriched intake air. In: International technological sciences and designs symposium. Akademiai Kiado, Turkey, pp 991–1000
Carroll RJ, Roeder K, Wasserman L (1999) Flexible parametric measurement error models. Biometrics 55(1):44–54
MATH Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability Taylor & Francis, London, pp 5–55
MATH Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793
Google Scholar
Cheng CL, Van Ness JW (1998) Statistical regression with measurement error. Wiley, Hoboken, NJ
MATH Google Scholar
De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245
MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Stat Methodol) 39(1):1–38
MathSciNet MATH Google Scholar
Frisch R (1935) Statistical confluence analysis by means of complete regression systems. Econ J 45(180):741–742
Google Scholar
Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Sringer, New York, NY
MATH Google Scholar
Fuller WA (1987) Measurement error models. John Wiley & Sons Inc, New York, NY
MATH Google Scholar
Gershenfeld N (1997) Nonliner inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24
Google Scholar
Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Shalabh, Heumann C (eds.) Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, Germany, pp 205–230
Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296
MATH Google Scholar
Hilden DL, Parks FB (1976) A single-cylinder engine study of methanol fuel-emphasis on organic emissions. In: Society of automotive engineers technical paper 760378
Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38
MathSciNet MATH Google Scholar
Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Graph Stat 12(1):55–79
MathSciNet Google Scholar
Hurvich CM, Simonoff JS, Tsai C-L (1998) Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J R Stat Soci Ser B (Stat Methodol) 60(2):271–293
MathSciNet MATH Google Scholar
Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401
MathSciNet MATH Google Scholar
Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182
MathSciNet MATH Google Scholar
Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113
MathSciNet MATH Google Scholar
Jacobs R, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixture of local experts. Neural Comput 3:78–88
Google Scholar
Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214
Google Scholar
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of inifinitely many incidental parameters. Ann Math Stat 27(4):887–906
MATH Google Scholar
Kuha J (1997) Estimation by data augmentation in regression models with continuous and discrete covariates measured with error. Stat Med 16(2):189–201
Google Scholar
Kuha J, Temple J (2003) Covariate measurement error in quadratic regression. Int Stat Rev 71(1):131–150
MATH Google Scholar
Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73(364):805–811
MATH Google Scholar
Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65(1):93–119
MATH Google Scholar
Lindsay BG (1983) The geometry of mixture likelihoods: a general theory. Ann Stat 11(1):86–94
MathSciNet MATH Google Scholar
Lindsay BG (1995) Mixture models: theory. In: Geometry and applications, volume 5 of NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics and the American Statistical Association
Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30
Google Scholar
McLachlan G, Basford K (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York, NY
MATH Google Scholar
McLachlan GJ (1987) On bootstrapping the likelihood ratio test Stastistic for the number of components in a normal mixture. J R Stat Soc Ser C (Appl Stat) 36(3):318–324
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York, NY
MATH Google Scholar
Mengersen KL, Robert CP, Titterington DM (eds) (2011) Mixtures: estimation and applications. Wiley, West Sussex, England
Midthune D, Carroll RJ, Freedman LS, Kipnis V (2016) Measurement error models with interactions. Biostatistics 17(2):277–290
MathSciNet MATH Google Scholar
Montuelle L, Le Pennec E (2014) Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach. Electron J Stat 8(1):1661–1695
MathSciNet MATH Google Scholar
Patra RK, Sen B (2016) Estimation of a two-component mixture model with applications to multiple testing. J R Stat Soc Ser B 78(4):869–893
MathSciNet MATH Google Scholar
Punzo A (2014) Flexible mixture modelling with the polynomial gaussian cluster-weighted model. Stat Model 14(3):257–291
MathSciNet MATH Google Scholar
Richardson S, Gilks WR (1993) A Bayesian approach to measurement error problems in epidemiology using conditional independence models. Am J Epidemiol 138(6):430–442
Google Scholar
Richardson S, Leblond L, Jaussent I, Green PJ (2002) Mixture models in measurement error problems, with reference to epidemiological studies. J R Stat Soc Ser A (Stat Soc) 165(3):549–566
MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
MathSciNet MATH Google Scholar
Shen Z, Levine M, Shang Z (2018) An MM algorithm for estimation of a two component semiparametric density mixture with a known component. Electron J Stat 12(1):1181–1209
MathSciNet MATH Google Scholar
Spiegelman D, Rosner B, Logan R (2000) Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J Am Stat Assoc 95(449):51–61
Google Scholar
Stefanski LA, Carroll RJ (1985) Covariate measurement error in logistic regression. Ann Stat 13(4):1335–1351
MathSciNet MATH Google Scholar
Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B (Stat Methodol) 62(4):795–809
MathSciNet MATH Google Scholar
Sugar EA, Wang C-Y, Prentice RL (2007) Logistic regression with exposure biomarkers and flexible measurement error. Biometrics 63(1):143–151
MathSciNet MATH Google Scholar
Teicher H (1963) Identifiability of finite mixtures. Ann Math Stat 34(4):1265–1269
MathSciNet MATH Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York, NY
MATH Google Scholar
Turner RT (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (Appl Stat) 49(3):371–384
MathSciNet MATH Google Scholar
Vandekerkhove P (2013) Estimation of a semiparametric mixture of regressions model. J Nonparametr Stat 25(1):181–208
MathSciNet MATH Google Scholar
Velosa J (1993) Error analysis of the vehicle exhaust emission measurement System. In: Society of Automotive Engineers Technical Paper 930393
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330
MathSciNet Google Scholar
Vilca F, Balakrishnan N, Zeller CB (2014) The bivariate Sinh-elliptical distribution with applications to Birnbaum-Saunders distribution and associated regression and measurement error models. Comput Stat Data Anal 80:1–16
MathSciNet MATH Google Scholar
Wedel M, DeSarbo WS (1995) A mixture likelihood approach for generalized linear models. J Classif 12(1):21–55
MATH Google Scholar
Yakowitz SJ, Spragins JD (1968) On the identifiability of finite mixtures. Ann Math Stat 39(1):209–214
MathSciNet MATH Google Scholar
Yang R, Sun X, Liu Z, Zhang Y, Fu J (2021) A numerical analysis of the effects of equivalence ratio measurement accuracy on the engine efficiency and emissions at varied compression ratios. Processes 9(8):1–14
Google Scholar
Yao W, Song W (2015) Mixtures of linear regression with measurement errors. Commun Stat Theory Methods 44(8):1602–1614
MathSciNet MATH Google Scholar
Young DS (2014) Mixtures of regressions with changepoints. Stat Comput 24(2):265–281
MathSciNet MATH Google Scholar
Young DS (2017) Handbook of regression methods. Chapman and Hall/CRC Press, Boca Raton, FL
MATH Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the helpful comments provided by three anonymous referees, which included highlighting relevant literature about developments in CWM methodology, critical comments about the data analysis, and suggestions about the overall structure of this manuscript.

Author information

Authors and Affiliations

Corporate & Investment Bank, J.P. Morgan, Brooklyn, NY, USA
Xiaoqiong Fang
School of Business, Government, and Economics, Seattle Pacific University, Seattle, WA, USA
Andy W. Chen
Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, KY, USA
Derek S. Young

Authors

Xiaoqiong Fang
View author publications
You can also search for this author inPubMed Google Scholar
Andy W. Chen
View author publications
You can also search for this author inPubMed Google Scholar
Derek S. Young
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Derek S. Young.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 70 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, X., Chen, A.W. & Young, D.S. Predictors with measurement error in mixtures of polynomial regressions. Comput Stat 38, 373–401 (2023). https://doi.org/10.1007/s00180-022-01232-5

Download citation

Received: 03 November 2020
Accepted: 19 April 2022
Published: 13 May 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s00180-022-01232-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predictors with measurement error in mixtures of polynomial regressions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mixtures of regressions using matrix-variate heavy-tailed distributions

Robust estimation of the number of components for mixtures of linear regression models

Robust mixture of linear mixed modeling via multivariate Laplace distribution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 70 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now