Abstract
Gaussian copula regression models provide a flexible, intuitive framework in which to model dependent responses with a variety of marginal distributions. With non-continuous outcomes, the time required to compute the likelihood directly grows exponentially with sample size. What alternatives exist rarely have been considered in a Bayesian framework. We conduct inference for Gaussian copula regression models of non-continuous outcomes using three distinct approaches in a Bayesian setting: the continuous extension, the distributional transform, and the composite likelihood. The latter two include curvature correction. We consider the posterior distributional shapes and computational performance as well. We consider both simulations of several types of non-continuous data and analyses of real data. Data sets and types were chosen to challenge the performance of these approaches. Using frequentist methods, we evaluate the inference resulting from these three approaches. The distributional transform with curvature correction has good to excellent coverage for discrete variables with numerous levels. It also offers considerably faster performance than the other options considered, making it attractive for evaluating models of mutually dependent non-continuous responses. For responses with fewer levels, composite likelihood may be the only viable option.






Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Bai Y, Kang J, Song PXK (2014) Efficient pairwise composite likelihood estimation for spatial-clustered data. Biometrics 70(3):661–670
Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, New York
Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Casella G, Berger RL (2002) Statistical inference, 2nd edn. Duxbury Thomson Learning, Belmont
Chandler RE, Bate S (2007) Inference for clustered data using the independence loglikelihood. Biometrika 94(1):167–183
de Backer M, de Keyser P, de Vroey C, Lesaffre E (1996) A 12-week treatment for dermatophyte toe onychomycosis: terbinafine 250 mg/day vs. itraconazole 200 mg/day—a double-blind comparitive trial. Br J Dermatol 134:16–17 (supplement 46)
Denuit M, Lambert P (2005) Constraints on concordance measures in bivariate discrete data. J Multivar Anal 93:40–57
Flegal J, Haran M, Jones G (2008) Markov chain Monte Carlo: can we trust the third significant figure? Stat Sci 23(2):250–260
Genest C, Nešlehová J (2007) A primer on copulas for count data. ASTIN Bull J IAA 37(2):475–515
Higgs MD, Hoeting JA (2010) A clipped latent variable model for spatially correlated ordered categorical data. Comput Stat Data Anal 54(8):1999–2011
Hughes J (2015) copCAR: a flexible regression model for areal data. J Comput Graph Stat 24(3):733–755
Joe H (2014) Dependence modeling with copulas. Chapman and Hall/CRC, New York
Kazianka H (2013) Approximate copula-based estimation and prediction of discrete spatial data. Stoch Environ Res Risk Assess 27:2015–2026
Kazianka H, Pilz J (2010) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24:661–673
Kolev N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plann Inference 139:3847–3856
Lindsay BG (1988) Contemporary mathematics volume 80, 1988. In: Statistical inference from stochastic processes: proceedings of the Ams-IMS-Siam joint summer research conference held August 9–15, 1987, with Support from the National Science Foundation and the Army Research Office, American Mathematical Soc., vol 80, pp 221–239, chapter title is “Composite Likelihood Methods”
Lindsay BG, Yi GY, Sun J (2011) Issues and strategies in the selection of composite likelihoods. Stat Sin 21:71–105
Madsen L (2009) Maximum likelihood estimation of regression parameters with spatially dependent discrete data. J Agric Biol Environ Stat 14(4):375–391
Madsen L, Fang Y (2011) Joint regression analysis for discrete longitudinal data. Biometrics 67(3):1171–1175
Marbac M, Biernacki C, Vandewalle V (2017) Model-based clustering of gaussian copulas for mixed data. Commun Stat Theory Methods 46(23):11,635–11,656
Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New York
Noland GS, Ayodo G, Abuya J, Hodges JS, Rolfes MA, John CC (2012) Decreased prevalence of anemia in highland areas of low malaria transmission after a 1-year interruption of transmission. Clin Infect Dis 54(2):178–184
Pitt M, Chan D, Kohn R (2006) Efficient Bayesian inference for gaussian copula regression models. Biometrika 93(3):537–554
Qu A, Song PXK (2004) Assessing robustness of generalised estimating equations and quadratic inference functions. Biometrika 91(2):447–459
Ribatet M, Cooley D, Davison A (2012) Bayesian inference for composite likelihood models and an application to spatial extremes. Stat Sin 22:813–845
Robert CP, Casella G (2004) Monte Carlo statistical methods, vol 319. Citeseer, Princeton
Rüschendorf L (2009) On the distributional transform, Sklar’s theorem, and the empirical copula process. J Stat Plan Inference 139(11):3921–3927
Smith MS, Khaled MA (2012) Estimation of copula models with discrete margins via bayesian data augmentation. J Am Stat Assoc 107(497):290–303
Song PXK (2000) Multivariate dispersion models generated from gaussian copula. Scand J Stat 27(2):305–320
Song PXK, Li M, Yuan Y (2009) Joint regression analysis of correlated data using gaussian copulas. Biometrics 65:60–68
Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD et al (2014) Xsede: accelerating scientific discovery. Comput Sci Eng 16(5):62–74
Varin C (2008) On composite marginal likelihoods. AStA Adv Stat Anal 92(1):1–28
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21(1):5–42
Yang L, Frees EW, Zhang Z (2020) Nonparametric estimation of copula regression models with discrete outcomes. J Am Stat Assoc 115(530):707–720
Acknowledgements
This work was carried out in part using computing resources provided by the Minnesota Supercomputing Institute (MSI) at the University of Minnesota. The support of Jaime Vega was instrumental in the preparation of several figures of this manuscript. URL: http://www.msi.umn.edu. This work additionally used the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al. 2014), specifically the resource comet, which is supported by National Science Foundation Grant Number ACI-1548562, through allocation DMS180005. John Hughes provided guidance in the early phases of this work. We are indebted to Jim Hodges as well for many helpful discussions. We are grateful to the Montgomery County, Maryland Department of Environmental Protection for providing the benthic narrative data set. Finally, the review team provided many helpful suggestions and perspectives that improved the clarity of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declare that he/she has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Henn, L.L. Limitations and performance of three approaches to Bayesian inference for Gaussian copula regression models of discrete data. Comput Stat 37, 909–946 (2022). https://doi.org/10.1007/s00180-021-01131-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01131-1