Skip to main content
Log in

Predictors with measurement error in mixtures of polynomial regressions

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

There has been a substantial body of research on mixtures-of-regressions models that has developed over the past 20 years. While much of the recent literature has focused on flexible mixtures-of-regressions models, there is still considerable utility for imposing structure on the mixture components through fully parametric models. One feature of the data that is scantly addressed in mixtures of regressions is the presence of measurement error in the predictors. The limited existing research on this topic concerns the case where classical measurement error is added to the classic mixtures-of-linear-regressions model. In this paper, we consider the setting of mixtures of polynomial regressions where the predictors are subject to classical measurement error. Moreover, each component is allowed to have a different degree for the polynomial structure. We utilize a generalized expectation-maximization algorithm for performing maximum likelihood estimation. For estimating standard errors, we extend a semiparametric bootstrap routine that has been employed for mixtures of linear regressions without measurement error in the predictors. Numeric work, for practical reasons identified, is limited to estimating two-component models. We consider a likelihood ratio test for determining if there is a higher-degree polynomial term in one of the components. Model selection criteria are also highlighted as a way for determining an appropriate model. A simulation study and an application to the classic nitric oxide emissions data are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Source: Brinkman (1981))

Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat. Comput. 12(2):163–174

    MathSciNet  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281

    Google Scholar 

  • Benaglia T, Chauveau D, Hunter DR, Young DS (2009) mixtools: an R package for analyzing finite mixture models. J Stat Softw 32(6):1–29

    Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Google Scholar 

  • Blackwell M, Honaker J, King G (2017) A unified approach to measurement error and missing data: overview and applications. Sociol Methods Res 46(3):303–341

    MathSciNet  Google Scholar 

  • Bordes L, Delmas C, Vandekerkhove P (2006) Semiparametric estimation of a two-component mixture model where one component is known. Scand J Stat 33(4):733–752

    MathSciNet  MATH  Google Scholar 

  • Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370

    MathSciNet  MATH  Google Scholar 

  • Brinkman ND (1981) Ethanol fuel - a single-cylinder engine study of efficiency and exhaust emissions. In: Society of automotive engineers technical paper 810345

  • Burnham KP, Anderson DR (2002) Model selection and multimodal inference: a practical information-theoretic approach. Springer, New York, NY

    MATH  Google Scholar 

  • Çakmak A, Kapusuz M, Özcan H (2018) Experimental research on emissions of an SI engine under oxygen-enriched intake air. In: International technological sciences and designs symposium. Akademiai Kiado, Turkey, pp 991–1000

  • Carroll RJ, Roeder K, Wasserman L (1999) Flexible parametric measurement error models. Biometrics 55(1):44–54

    MATH  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models. Chapman & Hall/CRC Monographs on Statistics & Applied Probability Taylor & Francis, London, pp 5–55

    MATH  Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recogn 28(5):781–793

    Google Scholar 

  • Cheng CL, Van Ness JW (1998) Statistical regression with measurement error. Wiley, Hoboken, NJ

    MATH  Google Scholar 

  • De Veaux RD (1989) Mixtures of linear regressions. Comput Stat Data Anal 8(3):227–245

    MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Stat Methodol) 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  • Frisch R (1935) Statistical confluence analysis by means of complete regression systems. Econ J 45(180):741–742

    Google Scholar 

  • Frühwirth-Schnatter S (2006) Finite mixture and markov switching models. Sringer, New York, NY

    MATH  Google Scholar 

  • Fuller WA (1987) Measurement error models. John Wiley & Sons Inc, New York, NY

    MATH  Google Scholar 

  • Gershenfeld N (1997) Nonliner inference and cluster-weighted modeling. Ann N Y Acad Sci 808(1):18–24

    Google Scholar 

  • Grün B, Leisch F (2008) Finite mixtures of generalized linear regression models. In: Shalabh, Heumann C (eds.) Recent advances in linear models and related areas: essays in honour of helge toutenburg. Physica-Verlag HD, Heidelberg, Germany, pp 205–230

  • Hennig C (2000) Identifiablity of models for clusterwise linear regression. J Classif 17(2):273–296

    MATH  Google Scholar 

  • Hilden DL, Parks FB (1976) A single-cylinder engine study of methanol fuel-emphasis on organic emissions. In: Society of automotive engineers technical paper 760378

  • Hunter DR, Young DS (2012) Semiparametric mixtures of regressions. J Nonparametr Stat 24(1):19–38

    MathSciNet  MATH  Google Scholar 

  • Hurn M, Justel A, Robert CP (2003) Estimating mixtures of regressions. J Comput Graph Stat 12(1):55–79

    MathSciNet  Google Scholar 

  • Hurvich CM, Simonoff JS, Tsai C-L (1998) Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J R Stat Soci Ser B (Stat Methodol) 60(2):271–293

    MathSciNet  MATH  Google Scholar 

  • Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29(3):363–401

    MathSciNet  MATH  Google Scholar 

  • Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182

    MathSciNet  MATH  Google Scholar 

  • Ingrassia S, Punzo A, Vittadini G, Minotti SC (2015) The generalized linear mixed cluster-weighted model. J Classif 32(1):85–113

    MathSciNet  MATH  Google Scholar 

  • Jacobs R, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixture of local experts. Neural Comput 3:78–88

    Google Scholar 

  • Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6(2):181–214

    Google Scholar 

  • Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of inifinitely many incidental parameters. Ann Math Stat 27(4):887–906

    MATH  Google Scholar 

  • Kuha J (1997) Estimation by data augmentation in regression models with continuous and discrete covariates measured with error. Stat Med 16(2):189–201

    Google Scholar 

  • Kuha J, Temple J (2003) Covariate measurement error in quadratic regression. Int Stat Rev 71(1):131–150

    MATH  Google Scholar 

  • Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73(364):805–811

    MATH  Google Scholar 

  • Lenk PJ, DeSarbo WS (2000) Bayesian inference for finite mixtures of generalized linear models with random effects. Psychometrika 65(1):93–119

    MATH  Google Scholar 

  • Lindsay BG (1983) The geometry of mixture likelihoods: a general theory. Ann Stat 11(1):86–94

    MathSciNet  MATH  Google Scholar 

  • Lindsay BG (1995) Mixture models: theory. In: Geometry and applications, volume 5 of NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics and the American Statistical Association

  • Mazza A, Punzo A, Ingrassia S (2018) flexCWM: a flexible framework for cluster-weighted models. J Stat Softw 86(2):1–30

    Google Scholar 

  • McLachlan G, Basford K (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New York, NY

    MATH  Google Scholar 

  • McLachlan GJ (1987) On bootstrapping the likelihood ratio test Stastistic for the number of components in a normal mixture. J R Stat Soc Ser C (Appl Stat) 36(3):318–324

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York, NY

    MATH  Google Scholar 

  • Mengersen KL, Robert CP, Titterington DM (eds) (2011) Mixtures: estimation and applications. Wiley, West Sussex, England

  • Midthune D, Carroll RJ, Freedman LS, Kipnis V (2016) Measurement error models with interactions. Biostatistics 17(2):277–290

    MathSciNet  MATH  Google Scholar 

  • Montuelle L, Le Pennec E (2014) Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach. Electron J Stat 8(1):1661–1695

    MathSciNet  MATH  Google Scholar 

  • Patra RK, Sen B (2016) Estimation of a two-component mixture model with applications to multiple testing. J R Stat Soc Ser B 78(4):869–893

    MathSciNet  MATH  Google Scholar 

  • Punzo A (2014) Flexible mixture modelling with the polynomial gaussian cluster-weighted model. Stat Model 14(3):257–291

    MathSciNet  MATH  Google Scholar 

  • Richardson S, Gilks WR (1993) A Bayesian approach to measurement error problems in epidemiology using conditional independence models. Am J Epidemiol 138(6):430–442

    Google Scholar 

  • Richardson S, Leblond L, Jaussent I, Green PJ (2002) Mixture models in measurement error problems, with reference to epidemiological studies. J R Stat Soc Ser A (Stat Soc) 165(3):549–566

    MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  MATH  Google Scholar 

  • Shen Z, Levine M, Shang Z (2018) An MM algorithm for estimation of a two component semiparametric density mixture with a known component. Electron J Stat 12(1):1181–1209

    MathSciNet  MATH  Google Scholar 

  • Spiegelman D, Rosner B, Logan R (2000) Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J Am Stat Assoc 95(449):51–61

    Google Scholar 

  • Stefanski LA, Carroll RJ (1985) Covariate measurement error in logistic regression. Ann Stat 13(4):1335–1351

    MathSciNet  MATH  Google Scholar 

  • Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B (Stat Methodol) 62(4):795–809

    MathSciNet  MATH  Google Scholar 

  • Sugar EA, Wang C-Y, Prentice RL (2007) Logistic regression with exposure biomarkers and flexible measurement error. Biometrics 63(1):143–151

    MathSciNet  MATH  Google Scholar 

  • Teicher H (1963) Identifiability of finite mixtures. Ann Math Stat 34(4):1265–1269

    MathSciNet  MATH  Google Scholar 

  • Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York, NY

    MATH  Google Scholar 

  • Turner RT (2000) Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. J R Stat Soc Ser C (Appl Stat) 49(3):371–384

    MathSciNet  MATH  Google Scholar 

  • Vandekerkhove P (2013) Estimation of a semiparametric mixture of regressions model. J Nonparametr Stat 25(1):181–208

    MathSciNet  MATH  Google Scholar 

  • Velosa J (1993) Error analysis of the vehicle exhaust emission measurement System. In: Society of Automotive Engineers Technical Paper 930393

  • Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12(4):315–330

    MathSciNet  Google Scholar 

  • Vilca F, Balakrishnan N, Zeller CB (2014) The bivariate Sinh-elliptical distribution with applications to Birnbaum-Saunders distribution and associated regression and measurement error models. Comput Stat Data Anal 80:1–16

    MathSciNet  MATH  Google Scholar 

  • Wedel M, DeSarbo WS (1995) A mixture likelihood approach for generalized linear models. J Classif 12(1):21–55

    MATH  Google Scholar 

  • Yakowitz SJ, Spragins JD (1968) On the identifiability of finite mixtures. Ann Math Stat 39(1):209–214

    MathSciNet  MATH  Google Scholar 

  • Yang R, Sun X, Liu Z, Zhang Y, Fu J (2021) A numerical analysis of the effects of equivalence ratio measurement accuracy on the engine efficiency and emissions at varied compression ratios. Processes 9(8):1–14

    Google Scholar 

  • Yao W, Song W (2015) Mixtures of linear regression with measurement errors. Commun Stat Theory Methods 44(8):1602–1614

    MathSciNet  MATH  Google Scholar 

  • Young DS (2014) Mixtures of regressions with changepoints. Stat Comput 24(2):265–281

    MathSciNet  MATH  Google Scholar 

  • Young DS (2017) Handbook of regression methods. Chapman and Hall/CRC Press, Boca Raton, FL

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge the helpful comments provided by three anonymous referees, which included highlighting relevant literature about developments in CWM methodology, critical comments about the data analysis, and suggestions about the overall structure of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek S. Young.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 70 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, X., Chen, A.W. & Young, D.S. Predictors with measurement error in mixtures of polynomial regressions. Comput Stat 38, 373–401 (2023). https://doi.org/10.1007/s00180-022-01232-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01232-5

Keywords