Skip to main content

Advertisement

Log in

Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bellio R, Varin C (2005) A pairwise likelihood approach to generalized linear models with crossed random effects. Stat Model 5:217–227

    Article  MathSciNet  MATH  Google Scholar 

  • Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc, Ser B 61:265–285

    Article  MATH  Google Scholar 

  • Buck Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20(1):279–285

    Article  Google Scholar 

  • Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208

    Article  MathSciNet  MATH  Google Scholar 

  • Cave M, Appana S, Patel M, Falkner KC, McClain CJ, Brock G (2010) Polychlorinated biphenyls, lead, and mercury are associated with liver disease in American adults: NHANES 2003–2004. Environ Health Perspect 118(12):1735–1742

    Article  Google Scholar 

  • Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (2008) National Health and Nutrition Examination Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2003–2004, Hyattsville

  • Chao HR, Wang SL, Lee WJ, Wang YF, Päpke O (2007) Levels of polybrominated diphenyl ethers (PBDEs) in breast milk from central Taiwan and their relation to infant birth outcome and maternal menstruation effects. Environ Int 33(2):239–245

    Article  Google Scholar 

  • Chan JS, Kuk AY (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97

    Article  MathSciNet  MATH  Google Scholar 

  • Clayton D, Rasbash J (1999) Estimation in large crossed random-effect models by data augmentation. J R Stat Soc, Ser A 162:425–436

    Article  Google Scholar 

  • Coull BA, Hobert JP, Ryan LM, Holmes LB (2001) Crossed random effect models for multiple outcomes in a study of teratogenesis. J Am Stat Assoc 96(456):1194–1204

    Article  MathSciNet  MATH  Google Scholar 

  • Ding G, Shi R, Gao Y, Zhang Y, Kamijima M, Sakai K, Wang G, Feng C, Tian Y (2012) Pyrethroid pesticide exposure and risk of childhood acute lymphocytic leukemia in Shanghai. Environ Sci Technol 46(24):13480–13487

    Article  Google Scholar 

  • Gennings C, Sabo R, Carneyb E (2010) Identifying subsets of complex mixtures most associated with complex diseases. Epidemiology 21(4):S77–S84

    Article  Google Scholar 

  • Geyer CJ, Thompson EA (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J R Stat Soc, Ser B 54(3):657–699

    MathSciNet  Google Scholar 

  • Giboney PT (2005) Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physcian 71(6):1105–1110

    Google Scholar 

  • Herbstman JB, Sjödin A, Jones R, Kurzon M, Lederman SA, Rauh VA, Needham LL, Wang R, Perera FP (2008) Prenatal exposure to PBDEs and neurodevelopment. Epidemiology 19(6):S348

  • Kortenkamp A (2008) Low dose mixture effects of endocrine disrupters: implications for risk assessment and epidemiology. Int J Androl 31(2):233–237

    Article  Google Scholar 

  • Kratz A, Ferraro M, Sluss PM, Lewandrowski KB (2004) Case records of the Massachusetts general hospital: laboratory values. N Engl J Med 351(15):1549–1563

    Google Scholar 

  • Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326

    Article  MathSciNet  MATH  Google Scholar 

  • Lindsay B (1998) Composite likelihood methods. Contemp Math 80:220–239

    MathSciNet  Google Scholar 

  • Main KM, Kiviranta H, Virtanen HE, Sundqvist E, Tuomisto JT, Tuomisto J, Vartiainen T, Skakkebaek NE, Toppari J (2007) Flame retardants in placenta and breast milk and cryptorchidism in newborn boys. Environ Health Perspect 115(10):1519–1526

    Google Scholar 

  • McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170

    Article  MathSciNet  MATH  Google Scholar 

  • Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York

    MATH  Google Scholar 

  • Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96:730–1164

    Article  MathSciNet  MATH  Google Scholar 

  • Pinheiro JC, Chao EC (2006) Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15:58–81

    Article  MathSciNet  Google Scholar 

  • Renard D, Molenberghs G, Geys H (2004) A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 44(4):649–667

    Article  MathSciNet  MATH  Google Scholar 

  • Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42

    MathSciNet  MATH  Google Scholar 

  • Xie Y, Chen Z, Albert PS (2013) A crossed random effects modeling approach for estimating diagnostic accuracy from ordinal ratings without a gold standard. Stat Med 32(20):3472–3485

    Article  MathSciNet  Google Scholar 

  • Zhang B, Chen Z, Albert PS (2012) Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. Biostatistics 13(1):74–88

    Article  MATH  Google Scholar 

Download references

Acknowledgments

We sincerely thank two anonymous reviewers, Associate Editor, and Editors for their valuable comments, which had substantially improved this manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of US Food and Drug Administration.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Liu.

Appendix 1: Model selection

Appendix 1: Model selection

In practice, after conducting the MCLE and MLE in joint latent class modeling with fixed K’s, data analysts need to determine the optimal number of latent classes. In the context of joint latent class modeling, a unified model selection strategy that can be applied to both MCLE and MLE is preferable. Here, we propose to employ the simulated likelihood approach (Geyer and Thompson 1992; Xie et al. 2013), combined with the Akaike information criterion (AIC), to select the best K. Let \(\hat{\varvec{\theta }}\) be the estimates obtained from the MCLE or MLE procedures. Note that the marginal likelihood (11), or equivalently (8), is the integration (summation) with respect to two latent processes \(L_i\) and \(\mathbf {b}_j\). By the rule of Monte Carlo integration, the maximized likelihood \(L(\hat{\varvec{\theta }})\) can be approximated by

$$\begin{aligned}&\displaystyle \hat{L}(\hat{\varvec{\theta }})= \displaystyle \frac{1}{\Lambda } \sum _{\lambda =1}^\Lambda \\&\quad \times \left[ \prod _{i=1}^I\left\{ \frac{e^{\,y_i\left( \hat{\beta }_0+\hat{\beta }_1L_i^{(\lambda )}+ \mathbf {w}_{i}^{\prime } \hat{\varvec{\gamma }}\right) }}{1+e^{\hat{\beta }_0+\hat{\beta }_1L_i^{(\lambda )}+\mathbf {w}_{i}^{\prime }\hat{\varvec{\gamma }}}} \prod _{j=1}^J\frac{f^{\,\,\,u_{ij}}_{V_{ij}|L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}}(v_{ij})e^{u_{ij}\left( \hat{\eta }_0+\hat{\eta }_1h(\hat{\mu }_{ij}(L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}), \mathbf {t}_{ij},\hat{\varvec{\zeta }})\right) }}{1+ e^{\hat{\eta _0}+\hat{\eta }_1h(\hat{\mu }_{ij}\left( L_i^{(\lambda )}, \mathbf {b}_j^{(\lambda )}, \mathbf {z}_{ij}), \mathbf {t}_{ij},\hat{\varvec{\zeta }}\right) }}\right\} \right] \end{aligned}$$

where \(\Lambda \) is the total number of sampling realizations (\(\Lambda =10^{6}\) in the analysis of case study), \(L_i^{(t)}\) is the tth simulated realizations from the multinomial distribution Multinomial\((1,(\hat{\pi }_0,\ldots , \,\hat{\pi }_{K-1}))\) for the ith subject, and \(\mathbf {b}_j^{(t)}\) is the tth simulated realizations from \(N\left( (0, 0)^\prime , \left( \begin{array}{cc} \hat{\sigma }_0^2 &{} \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 \\ \hat{\rho }\hat{\sigma }_0\hat{\sigma }_1 &{} \hat{\sigma }_1^2 \\ \end{array} \right) \right) \) for the jth biomarker. Once \(\hat{L}(\hat{\varvec{\theta }})\) is obtained, the AIC values can be calculated accordingly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Liu, W., Zhang, H. et al. Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data. Comput Stat 31, 425–449 (2016). https://doi.org/10.1007/s00180-015-0597-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0597-3

Keywords

Navigation