Skip to main content
Log in

Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Logistic regression is a standard model in many studies of binary outcome data, and the analysis of missing data in this model is a fascinating topic. Based on the idea of Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat, 37:490–517, proposed are two different types of multiple imputation (MI) estimation methods, which each use three empirical conditional distribution functions to generate random values to impute missing data, to estimate the parameters of logistic regression with covariates missing at random (MAR) separately or simultaneously by using the estimating equations of Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc, 91:490–498. The derivation of the two proposed MI estimation methods is under the assumption of MAR separately or simultaneously and exclusively for categorical/discrete data. The two proposed methods are computationally effective, as evidenced by simulation studies. They have a quite similar efficiency and outperform the complete-case, semiparametric inverse probability weighting, validation likelihood, and random forest MI by chained equations methods. Although the two proposed methods are comparable with the joint conditional likelihood (JCL) method, they have more straightforward calculations and shorter computing times compared to the JCL and MICE methods. Two real data examples are used to illustrate the applicability of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 75:11–20

    Article  MathSciNet  MATH  Google Scholar 

  • Buuren SV, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67

    Article  Google Scholar 

  • Dong Y, Peng CYJ (2013) Principled missing data methods for researchers. Springer, Berlin

    Book  Google Scholar 

  • Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc 91:490–498

    Article  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  MATH  Google Scholar 

  • Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plann Infer 140:927–940

    Article  MathSciNet  MATH  Google Scholar 

  • Hsieh SH, Li CS, Lee SM (2013) Logistic regression with outcome and covariates missing separately or simultaneously. Comput Stat Data Anal 66:32–54

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang W, Josse J, Lavielle M, Group T (2020) Logistic regression with missing covariates|parameter estimation, model selection and prediction within a joint-modeling framework. Comput Stat Data Anal 145:106907

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SM, Gee MJ, Hsieh SH (2011) Semiparametric methods in the proportional odds model for ordinal response data with missing covariates. Biometrics 67:788–798

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SM, Hwang WH, de Dieu Tapsoba J (2016) Estimation in closed capture-recapture models when covariates are missing at random. Biometrics 72:1294–1304

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SM, Li CS, Hsieh SH, Huang LH (2012) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653

    Article  MathSciNet  MATH  Google Scholar 

  • Lee SM, Lukusa TM, Li CS (2020) Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods. Computat Stat 35:725–754

    Article  MathSciNet  MATH  Google Scholar 

  • Lipsitz SR, Parzen M, Ewell M (1998) Inference using conditional logistic regression with missing covariates. Biometrics 54:295–303

    Article  MATH  Google Scholar 

  • Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237

    Google Scholar 

  • Little RJ, Rubin DB (2019) Statistical analysis with missing data, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Lukusa TM, Lee SM, Li CS (2016) Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates. Metrika 79:457–483

    Article  MathSciNet  MATH  Google Scholar 

  • Pahel BT, Preisser JS, Stearns SC, Rozier RG (2011) Multiple imputation of dental caries data using a zero-inflated Poisson regression model. J Public Health Dent 71:71–78

    Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin DB (1987) Statistical analysis with missing data. Wiley, New York

    MATH  Google Scholar 

  • Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489

    Article  MATH  Google Scholar 

  • Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374

    Article  MathSciNet  MATH  Google Scholar 

  • Tran PL, Le TN, Lee SM, Li CS (2021) Estimation of parameters of logistic regression with covariates missing separately or simultaneously. Communications in statistics - Theory and methods, in press

  • Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariate data. Statistica Sinica 12:555–574

    MathSciNet  MATH  Google Scholar 

  • Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression analysis with missing covariate data. J Am Stat Assoc 92:512–525

    Article  MathSciNet  MATH  Google Scholar 

  • Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517

    Article  MathSciNet  MATH  Google Scholar 

  • Wang S, Wang CY (2001) A note on kernel assisted estimators in missing covariate regression. Statistics and Probability Letters 55:439–449

    Article  MathSciNet  MATH  Google Scholar 

  • White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399

    Article  MathSciNet  Google Scholar 

  • Zhao LP, Lipsitz S (1992) Designs and analysis of two-stage studies. Stat Med 11:769–782

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank two referees and an Associate Editor for their constructive comments that improved the presentation. The research of S.M. Lee and T.N. Le was supported by Ministry of Science and Technology (MOST) Grant of Taiwan, ROC, MOST-109-2118-M-035-002-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Shang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, SM., Le, TN., Tran, PL. et al. Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods. Comput Stat 38, 899–934 (2023). https://doi.org/10.1007/s00180-022-01250-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-022-01250-3

Keywords