Skip to main content

Advertisement

Log in

Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Gaussian copula regression models provide a flexible, intuitive framework in which to model dependent responses with a variety of marginal distributions. With non-continuous outcomes, the time required to compute the likelihood directly grows exponentially with sample size. What alternatives exist rarely have been considered in a Bayesian framework. We conduct inference for Gaussian copula regression models of non-continuous outcomes using three distinct approaches in a Bayesian setting: the continuous extension, the distributional transform, and the composite likelihood. The latter two include curvature correction. We consider the posterior distributional shapes and computational performance as well. We consider both simulations of several types of non-continuous data and analyses of real data. Data sets and types were chosen to challenge the performance of these approaches. Using frequentist methods, we evaluate the inference resulting from these three approaches. The distributional transform with curvature correction has good to excellent coverage for discrete variables with numerous levels. It also offers considerably faster performance than the other options considered, making it attractive for evaluating models of mutually dependent non-continuous responses. For responses with fewer levels, composite likelihood may be the only viable option.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bai Y, Kang J, Song PXK (2014) Efficient pairwise composite likelihood estimation for spatial-clustered data. Biometrics 70(3):661–670

    Article  MathSciNet  Google Scholar 

  • Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, New York

    MATH  Google Scholar 

  • Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208

    Article  MathSciNet  Google Scholar 

  • Casella G, Berger RL (2002) Statistical inference, 2nd edn. Duxbury Thomson Learning, Belmont

    MATH  Google Scholar 

  • Chandler RE, Bate S (2007) Inference for clustered data using the independence loglikelihood. Biometrika 94(1):167–183

    Article  MathSciNet  Google Scholar 

  • de Backer M, de Keyser P, de Vroey C, Lesaffre E (1996) A 12-week treatment for dermatophyte toe onychomycosis: terbinafine 250 mg/day vs. itraconazole 200 mg/day—a double-blind comparitive trial. Br J Dermatol 134:16–17 (supplement 46)

  • Denuit M, Lambert P (2005) Constraints on concordance measures in bivariate discrete data. J Multivar Anal 93:40–57

    Article  MathSciNet  Google Scholar 

  • Flegal J, Haran M, Jones G (2008) Markov chain Monte Carlo: can we trust the third significant figure? Stat Sci 23(2):250–260

    Article  MathSciNet  Google Scholar 

  • Genest C, Nešlehová J (2007) A primer on copulas for count data. ASTIN Bull J IAA 37(2):475–515

    Article  MathSciNet  Google Scholar 

  • Higgs MD, Hoeting JA (2010) A clipped latent variable model for spatially correlated ordered categorical data. Comput Stat Data Anal 54(8):1999–2011

    Article  MathSciNet  Google Scholar 

  • Hughes J (2015) copCAR: a flexible regression model for areal data. J Comput Graph Stat 24(3):733–755

    Article  MathSciNet  Google Scholar 

  • Joe H (2014) Dependence modeling with copulas. Chapman and Hall/CRC, New York

    Book  Google Scholar 

  • Kazianka H (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess 27:2015–2026

    Article  Google Scholar 

  • Kazianka H, Pilz J (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661–673

    Article  Google Scholar 

  • Kolev N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plann Inference 139:3847–3856

    Article  MathSciNet  Google Scholar 

  • Lindsay BG (1988) Contemporary mathematics volume 80, 1988. In: Statistical inference from stochastic processes: proceedings of the Ams-IMS-Siam joint summer research conference held August 9–15, 1987, with Support from the National Science Foundation and the Army Research Office, American Mathematical Soc., vol 80, pp 221–239, chapter title is “Composite Likelihood Methods”

  • Lindsay BG, Yi GY, Sun J (2011) Issues and strategies in the selection of composite likelihoods. Stat Sin 21:71–105

    MathSciNet  MATH  Google Scholar 

  • Madsen L (2009) Maximum likelihood estimation of regression parameters with spatially dependent discrete data. J Agric Biol Environ Stat 14(4):375–391

    Article  MathSciNet  Google Scholar 

  • Madsen L, Fang Y (2011) Joint regression analysis for discrete longitudinal data. Biometrics 67(3):1171–1175

    Article  MathSciNet  Google Scholar 

  • Marbac M, Biernacki C, Vandewalle V (2017) Model-based clustering of gaussian copulas for mixed data. Commun Stat Theory Methods 46(23):11,635–11,656

    Article  MathSciNet  Google Scholar 

  • Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York

    MATH  Google Scholar 

  • Noland GS, Ayodo G, Abuya J, Hodges JS, Rolfes MA, John CC (2012) Decreased prevalence of anemia in highland areas of low malaria transmission after a 1-year interruption of transmission. Clin Infect Dis 54(2):178–184

    Article  Google Scholar 

  • Pitt M, Chan D, Kohn R (2006) Efficient Bayesian inference for gaussian copula regression models. Biometrika 93(3):537–554

    Article  MathSciNet  Google Scholar 

  • Qu A, Song PXK (2004) Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika 91(2):447–459

    Article  MathSciNet  Google Scholar 

  • Ribatet M, Cooley D, Davison A (2012) Bayesian inference for composite likelihood models and an application to spatial extremes. Stat Sin 22:813–845

    MATH  Google Scholar 

  • Robert CP, Casella G (2004) Monte Carlo statistical methods, vol 319. Citeseer, Princeton

    Book  Google Scholar 

  • Rüschendorf L (2009) On the distributional transform, Sklar’s theorem, and the empirical copula process. J Stat Plan Inference 139(11):3921–3927

    Article  MathSciNet  Google Scholar 

  • Smith MS, Khaled MA (2012) Estimation of copula models with discrete margins via bayesian data augmentation. J Am Stat Assoc 107(497):290–303

    Article  MathSciNet  Google Scholar 

  • Song PXK (2000) Multivariate dispersion models generated from gaussian copula. Scand J Stat 27(2):305–320

    Article  MathSciNet  Google Scholar 

  • Song PXK, Li M, Yuan Y (2009) Joint regression analysis of correlated data using gaussian copulas. Biometrics 65:60–68

    Article  MathSciNet  Google Scholar 

  • Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD et al (2014) Xsede: accelerating scientific discovery. Comput Sci Eng 16(5):62–74

    Article  Google Scholar 

  • Varin C (2008) On composite marginal likelihoods. AStA Adv Stat Anal 92(1):1–28

    Article  MathSciNet  Google Scholar 

  • Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21(1):5–42

    MathSciNet  MATH  Google Scholar 

  • Yang L, Frees EW, Zhang Z (2020) Nonparametric estimation of copula regression models with discrete outcomes. J Am Stat Assoc 115(530):707–720

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was carried out in part using computing resources provided by the Minnesota Supercomputing Institute (MSI) at the University of Minnesota. The support of Jaime Vega was instrumental in the preparation of several figures of this manuscript. URL: http://www.msi.umn.edu. This work additionally used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al. 2014), specifically the resource comet, which is supported by National Science Foundation Grant Number ACI-1548562, through allocation DMS180005. John Hughes provided guidance in the early phases of this work. We are indebted to Jim Hodges as well for many helpful discussions. We are grateful to the Montgomery County, Maryland Department of Environmental Protection for providing the benthic narrative data set. Finally, the review team provided many helpful suggestions and perspectives that improved the clarity of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. L. Henn.

Ethics declarations

Conflict of interest

The author declare that he/she has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 124 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Henn, L.L. Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data. Comput Stat 37, 909–946 (2022). https://doi.org/10.1007/s00180-021-01131-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-021-01131-1

Keywords

Navigation