Abstract
Logistic regression is a standard model in many studies of binary outcome data, and the analysis of missing data in this model is a fascinating topic. Based on the idea of Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat, 37:490–517, proposed are two different types of multiple imputation (MI) estimation methods, which each use three empirical conditional distribution functions to generate random values to impute missing data, to estimate the parameters of logistic regression with covariates missing at random (MAR) separately or simultaneously by using the estimating equations of Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc, 91:490–498. The derivation of the two proposed MI estimation methods is under the assumption of MAR separately or simultaneously and exclusively for categorical/discrete data. The two proposed methods are computationally effective, as evidenced by simulation studies. They have a quite similar efficiency and outperform the complete-case, semiparametric inverse probability weighting, validation likelihood, and random forest MI by chained equations methods. Although the two proposed methods are comparable with the joint conditional likelihood (JCL) method, they have more straightforward calculations and shorter computing times compared to the JCL and MICE methods. Two real data examples are used to illustrate the applicability of the proposed methods.

Similar content being viewed by others
References
Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 75:11–20
Buuren SV, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67
Dong Y, Peng CYJ (2013) Principled missing data methods for researchers. Springer, Berlin
Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc 91:490–498
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, New York
Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plann Infer 140:927–940
Hsieh SH, Li CS, Lee SM (2013) Logistic regression with outcome and covariates missing separately or simultaneously. Comput Stat Data Anal 66:32–54
Jiang W, Josse J, Lavielle M, Group T (2020) Logistic regression with missing covariates|parameter estimation, model selection and prediction within a joint-modeling framework. Comput Stat Data Anal 145:106907
Lee SM, Gee MJ, Hsieh SH (2011) Semiparametric methods in the proportional odds model for ordinal response data with missing covariates. Biometrics 67:788–798
Lee SM, Hwang WH, de Dieu Tapsoba J (2016) Estimation in closed capture-recapture models when covariates are missing at random. Biometrics 72:1294–1304
Lee SM, Li CS, Hsieh SH, Huang LH (2012) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653
Lee SM, Lukusa TM, Li CS (2020) Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods. Computat Stat 35:725–754
Lipsitz SR, Parzen M, Ewell M (1998) Inference using conditional logistic regression with missing covariates. Biometrics 54:295–303
Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237
Little RJ, Rubin DB (2019) Statistical analysis with missing data, 3rd edn. Wiley, New York
Lukusa TM, Lee SM, Li CS (2016) Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates. Metrika 79:457–483
Pahel BT, Preisser JS, Stearns SC, Rozier RG (2011) Multiple imputation of dental caries data using a zero-inflated Poisson regression model. J Public Health Dent 71:71–78
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Rubin DB (1987) Statistical analysis with missing data. Wiley, New York
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489
Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374
Tran PL, Le TN, Lee SM, Li CS (2021) Estimation of parameters of logistic regression with covariates missing separately or simultaneously. Communications in statistics - Theory and methods, in press
Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariate data. Statistica Sinica 12:555–574
Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression analysis with missing covariate data. J Am Stat Assoc 92:512–525
Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517
Wang S, Wang CY (2001) A note on kernel assisted estimators in missing covariate regression. Statistics and Probability Letters 55:439–449
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399
Zhao LP, Lipsitz S (1992) Designs and analysis of two-stage studies. Stat Med 11:769–782
Acknowledgements
The authors thank two referees and an Associate Editor for their constructive comments that improved the presentation. The research of S.M. Lee and T.N. Le was supported by Ministry of Science and Technology (MOST) Grant of Taiwan, ROC, MOST-109-2118-M-035-002-MY3.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, SM., Le, TN., Tran, PL. et al. Estimation of logistic regression with covariates missing separately or simultaneously via multiple imputation methods. Comput Stat 38, 899–934 (2023). https://doi.org/10.1007/s00180-022-01250-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-022-01250-3