Skip to main content
Log in

Approximate conditional likelihood for generalized linear models with general missing data mechanism

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

The generalized linear model is an indispensable tool for analyzing non-Gaussian response data, with both canonical and non-canonical link functions comprehensively used. When missing values are present, many existing methods in the literature heavily depend on an unverifiable assumption of the missing data mechanism, and they fail when the assumption is violated. This paper proposes a missing data mechanism that is as generally applicable as possible, which includes both ignorable and nonignorable missing data cases, as well as both scenarios of missing values in response and covariate. Under this general missing data mechanism, the authors adopt an approximate conditional likelihood method to estimate unknown parameters. The authors rigorously establish the regularity conditions under which the unknown parameters are identifiable under the approximate conditional likelihood approach. For parameters that are identifiable, the authors prove the asymptotic normality of the estimators obtained by maximizing the approximate conditional likelihood. Some simulation studies are conducted to evaluate finite sample performance of the proposed estimators as well as estimators from some existing methods. Finally, the authors present a biomarker analysis in prostate cancer study to illustrate the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. McCullagh P and Nelder J A, Generalized Linear Models, 2nd Edition, Chapman & Hall/CRC, 1989.

    Book  MATH  Google Scholar 

  2. Little R J and Rubin D B, Statistical Analysis with Missing Data, 2nd Edition, Wiley, New York, 2002.

    Book  MATH  Google Scholar 

  3. Tsiatis A A, Semiparametric Theory and Missing Data, Springer, 2006.

    MATH  Google Scholar 

  4. Ibrahim J G, Chen M H, Lipsitz S R, et al., Missing-data methods for generalized linear models: A comparative review, Journal of the American Statistical Association, 2005, 100: 332–346.

    Article  MathSciNet  MATH  Google Scholar 

  5. Tang G, Little R J, and Raghunathan T E, Analysis of multivariate missing data with nonignorable nonresponse. Biometrika, 2003, 90: 747–764.

    Article  MathSciNet  MATH  Google Scholar 

  6. Shao J and Zhao J, Estimation in longitudinal studies with nonignorable dropout. Statistics and Its Interface, 2013, 6: 303–313.

    Article  MathSciNet  MATH  Google Scholar 

  7. Wang S, Shao J, and Kim J K, An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica, 2014, 24: 1097–1116.

    MathSciNet  MATH  Google Scholar 

  8. Fang F, Zhao J, and Shao J, Imputation-based adjusted score equations in generalized linear models with nonignorable missing covariate values. Statistica Sinica, 2017, DOI: 10.5705/ss.202015.0437.

    Google Scholar 

  9. Kalbfleisch J D, Likelihood methods and nonparametric tests. Journal of the American Statistical Association, 1978, 73: 167–170.

    Article  MathSciNet  MATH  Google Scholar 

  10. Liang K Y and Qin J, Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2000, 62: 773–786.

    Article  MathSciNet  MATH  Google Scholar 

  11. Robins J M and Ritov Y, Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Statistics in Medicine, 1997, 16: 285–319.

    Article  Google Scholar 

  12. Varambally S, Dhanasekaran S M, Zhou M, et al., The polycomb group protein EZH2 is involved in progression of prostate cancer, Nature, 2002, 419: 624–629.

    Article  Google Scholar 

  13. Zhao J and Shao J, Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data. Journal of the American Statistical Association, 2015, 110: 1577–1590.

    Article  MathSciNet  Google Scholar 

  14. Sen P K, On some convergence properties of U-statistics. Calcutta Statist. Assoc. Bull, 1960, 10: 1–18.

    Article  MathSciNet  MATH  Google Scholar 

  15. Tomlins S A, Mehra R, Rhodes D R, et al., Integrative molecular concept modeling of prostate cancer progression, Nature Genetics, 2007, 39: 41–51.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiwei Zhao.

Additional information

This paper was supported by the Chinese 111 Project B14019, the US National Science Foundation under Grant Nos. DMS-1305474 and DMS-1612873, and the US National Institutes of Health Award UL1TR001412.

This paper was recommended for publication by Editor-in-Chief GAO Xiao-Shan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, J., Shao, J. Approximate conditional likelihood for generalized linear models with general missing data mechanism. J Syst Sci Complex 30, 139–153 (2017). https://doi.org/10.1007/s11424-017-6188-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-017-6188-3

Keywords

Navigation