A Bayesian approach to the analysis of asymmetric association for two-way contingency tables

Wei, Zheng; Kim, Daeyoung; Conlon, Erin M.

doi:10.1007/s00180-021-01161-9

A Bayesian approach to the analysis of asymmetric association for two-way contingency tables

Original paper
Published: 25 October 2021

Volume 37, pages 1311–1338, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

367 Accesses
1 Citation
Explore all metrics

Abstract

Recently, a subcopula-based asymmetric association measure was developed for the variables in two-way contingency tables. Here, we develop a fully Bayesian method to implement this measure, and examine its performance using simulation data and several real data sets of colorectal cancer. We use coverage probabilities and lengths of the interval estimators to compare the Bayesian approach and a large-sample method of analysis. In simulation studies, we find that the Bayesian method outperforms the large-sample method on average, and provides either similar or improved results for the real data analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving the power of hypothesis tests in sparse contingency tables

Article Open access 03 August 2023

New Confidence Intervals for Relative Risk of Two Correlated Proportions

Article 20 May 2022

A semi-parametric approach to impute mixed continuous and categorical data

Article 14 September 2014

References

Agresti A (2010) Analysis of ordinal categorical data, vol 656. John Wiley & Sons
Agresti A, Hitchcock DB (2005) Bayesian inference for categorical data analysis. Stat Methods Appl 14(3):297–330
Article MathSciNet Google Scholar
Agresti A, Kateri M (2011) Categorical data analysis. Springer
Agresti A, Min Y (2005) Frequentist performance of Bayesian confidence intervals for comparing proportions in 2$\times $2 contingency tables. Biometrics 61(2):515–523
Article MathSciNet Google Scholar
Baker SG (1994) The multinomial-Poisson transformation. J Roy Stat Soc Ser D 43(4):495–504
Google Scholar
Beh EJ, Lombardo R (2014) Correspondence analysis: theory. John Wiley & Sons, Practice and new strategies
MATH Google Scholar
Beh EJ, Simonetti B, D’Ambra L (2007) Partitioning a non-symmetric measure of association for three-way contingency tables. J Multivar Anal 98(7):1391–1411
Bové DS, Held L (2011) Hyper-$ g $ priors for generalized linear models. Bayesian Anal 6(3):387–410
Article MathSciNet Google Scholar
Choi L, Blume JD, Dupont WD (2015) Elucidating the foundations of statistical inference with 2 x 2 tables. PLOS ONE 10(4):e0121263
Article Google Scholar
Dellaportas P, Forster JJ (1999) Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86(3):615–633
Article MathSciNet Google Scholar
D’Ambra L, Lauro NC (1992) Non symmetrical exploratory data analysis. Statistica Applicata 4(4):511–529
Forster JJ (2010) Bayesian inference for Poisson and multinomial log-linear models. Stat Methodol 7(3):210–224
Article MathSciNet Google Scholar
Gamerman D (1997) Sampling from the posterior distribution in generalized linear mixed models. Stat Comput 7(1):57–68
Article Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. CRC Press
Geweke J (1992) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics, vol 4. Oxford University Press, Oxford, pp 169–193
Ghosh M, Zhang L, Mukherjee B (2006) Equivalence of posteriors in the Bayesian analysis of the multinomial-Poisson transformation. Metron Int J Stat 1(64):19–28
MathSciNet MATH Google Scholar
Goodman LA, Kruskal WH (1954) Measures of association for cross classifications. J Am Stat Assoc 49(268):732–764
MATH Google Scholar
Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91(435):1343–1370
Article Google Scholar
Kawasaki T, Ohnishi M, Nosho K, Suemoto Y, Kirkner GJ, Meyerhardt JA, Fuchs CS, Ogino S (2008) CpG island methylator phenotype-low (CIMP-low) colorectal cancer shows not only few methylated CIMP-high-specific CpG islands, but also low-level methylation at individual loci. Modern Pathol 21(3):245
Article Google Scholar
Larntz K (1978) Small-sample comparisons of exact levels for chi-squared goodness-of-fit statistics. J Am Stat Assoc 73:253–263
Article Google Scholar
Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform 7(Suppl 1):S7
Article Google Scholar
Mukerjee R, Reid N (2001) Comparison of test statistics via expected lengths of associated confidence intervals. J Stat Plan Infer 97:141–151
Article MathSciNet Google Scholar
Nasierowska-Guttmejer A, Trzeciak L, Nowacki MP, Ostrowski J (2000) p53 protein accumulation and p53 gene mutation in colorectal cancer. Pathol Oncol Res 6(4):275–279
Article Google Scholar
Nelsen R (2006) An introduction to copulas, 2nd edn. Springer, New York
Newcombe RG (2011) Measures of location for confidence intervals for proportions. Commun Stat Theory Methods 40(10):1743–1767
Article MathSciNet Google Scholar
Ntzoufras I, Dellaportas P, Forster JJ (2003) Bayesian variable and link determination for generalised linear models. J Stat Plan Infer 111(1–2):165–180
Article MathSciNet Google Scholar
Ogino S, Kawasaki T, Kirkner GJ, Loda M, Fuchs CS (2006) CpG island methylator phenotype-low (CIMP-low) in colorectal cancer: possible associations with male sex and KRAS mutations. J Mol Diag 8(5):582–588
Article Google Scholar
Overstall AM, King R (2014a) conting: An R package for Bayesian analysis of complete and incomplete contingency tables. J Stat Softw 58(7):1–27
Article Google Scholar
Overstall AM, King R (2014b) A default prior distribution for contingency tables with dependent factor levels. Stat Methodol 16:90–99
Article MathSciNet Google Scholar
Overstall AM, King R, Bird SM, Hutchinson SJ, Hay G (2014) Incomplete contingency tables with censored cells with application to estimating the number of people who inject drugs in Scotland. Stat Med 33(9):1564–1579
Article MathSciNet Google Scholar
Park H, Leemis LM (2019) Ensemble confidence intervals for binomial proportions. Stat Med 38:3460–3475
Article MathSciNet Google Scholar
Poole D, Raftery AE (2000) Inference for deterministic simulation models: the Bayesian melding approach. J Am Stat Assoc 95:1244–1255
Article MathSciNet Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266
Article MathSciNet Google Scholar
Rambau PF, Odida M, Wabinga H (2008) p53 expression in colorectal carcinoma in relation to histopathological features in Ugandan patients. Afr Health Sci 8(4):234–238
Google Scholar
Sklar M (1959) Fonctions de répartition à n dimensions et leurs marges. Université Paris 8:229–231
MathSciNet MATH Google Scholar
van Someren EP, Wessels LFA, Backer E, Reinders MJT (2002) Genetic network modeling. Pharmacogenomics 3(4):507–525
Article Google Scholar
Theil H (1970) On the estimation of relationships involving qualitative variables. Am J Sociol 76(1):103–154
Article Google Scholar
Wei Z, Kim D (2017) Subcopula-based measure of asymmetric association for contingency tables. Stat Med 36(24):3875–3894
Article MathSciNet Google Scholar
Zhang Y, Song M (2013) Deciphering interactions in causal networks without parametric assumptions. arXiv preprint http://arxiv.org/abs/1311.2707
Zhong H, Song M (2019) A fast exact functional test for directional association and cancer biology applications. IEEE/ACM Trans Comput Biol Bioinform 16(3):818–826
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Maine, Orono, ME, USA
Zheng Wei
Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA, USA
Daeyoung Kim & Erin M. Conlon

Authors

Zheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Daeyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Erin M. Conlon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erin M. Conlon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 372 KB)

Appendix

1.1 1 Asymptotic distribution of the subcopula-based asymmetric association measures

The asymptotic distributions of the estimators for the subcopula-based asymmetric association measures provided in Eqs. (4) and (5) are developed in the following theorem.

Theorem 1

(Wei and Kim 2017) Let ${\hat{\rho }}^{2}_{_{({X \rightarrow Y})}}$ be the estimator for the asymmetric measure ${\rho }^{2}_{_{({X \rightarrow Y})}}$ defined in Eq. (4). Then,

$$\begin{aligned} \sqrt{n}\left( {\hat{\rho }}^{2}_{_{({X \rightarrow Y})}}-\rho ^{2}_{_{({X \rightarrow Y})}}\right) \overset{D}{\rightarrow } N\left( 0,\nabla h_1({\varvec{p}})^{T}\left( diag({\varvec{p}})-{\varvec{p}} {\varvec{p}}^T\right) \, \nabla h_1({\varvec{p}})\right) , \end{aligned}$$

(17)

where ${\varvec{p}}\,=\,(p_{11}, p_{12},\ldots , {p}_{I(J-1)}, {p_{IJ}})^T$,

$\nabla h_1({\varvec{p}}) {=} \begin{bmatrix} \frac{\partial {{\rho }^{2}_{_{({X \rightarrow Y})}}}}{\partial p_{11}} , \frac{\partial {{\rho }^{2}_{_{({X \rightarrow Y})}}}}{\partial p_{12}} , \,\ldots \, ,\frac{\partial {{\rho }^{2}_{_{({X \rightarrow Y})}}}}{\partial p_{IJ}} \end{bmatrix}^{T}$, and $\frac{\partial {\rho }^{2}_{_{({X \rightarrow Y})}}}{\partial p_{st}}$ is given in Eq. (A.4)–(A.8) in Wei and Kim (2017), for $s=1,\ldots ,I,$ and $t=1,\ldots ,J$. Let $\nabla h_2({\varvec{p}}) = \begin{bmatrix} \frac{\partial {{\rho }^{2}_{_{({Y \rightarrow X})}}}}{\partial p_{11}} , \frac{\partial {{\rho }^{2}_{_{({Y \rightarrow X})}}}}{\partial p_{12}} , \,\ldots \, ,\frac{\partial {{\rho }^{2}_{_{({Y \rightarrow X})}}}}{\partial p_{IJ}} \end{bmatrix}^{T}$, and $\frac{\partial {\rho }^{2}_{_{({Y \rightarrow X})}}}{\partial p_{st}}$ is given in Eq. (A.9)–(A.14) in Wei and Kim (2017). Then, the asymptotic distribution of ${\hat{\rho }}^{2}_{_{({Y \rightarrow X})}}$ can be obtained by substituting $\nabla h_2({\varvec{p}}) $ for $\nabla h_1({\varvec{p}}) $ in Equation (17).

Based on Theorem 1, the following corollary is derived for the asymptotic distribution of the difference between the estimators of the asymmetric measures, (${\hat{\rho }}^{2}_{_{({X \rightarrow Y})}}-{\hat{\rho }}^{2}_{_{({Y \rightarrow X})}}$):

Corollary 1

Let ${\hat{\rho }}^{2}_{_{({X \rightarrow Y})}}$ and ${\hat{\rho }}^{2}_{_{({Y \rightarrow X})}}$ be the estimators for the asymmetric measures ${\rho }^{2}_{_{({X \rightarrow Y})}}$ and ${\rho }^{2}_{_{({Y \rightarrow X})}}$ defined in Eq. (4) and (5). Then,

$$\begin{aligned}&\sqrt{n}\left( ({\hat{\rho }}^{2}_{_{({X \rightarrow Y})}}-{\hat{\rho }}^{2}_ {_{({Y \rightarrow X})}})-(\rho ^{2}_{_{({X \rightarrow Y})}}-\rho ^{2}_{_{({Y \rightarrow X})}})\right) \overset{D}{\rightarrow }\nonumber \\&\quad N\left( 0, (\nabla h_1({\varvec{p}})-\nabla h_2({\varvec{p}}))^{T}\left( diag({\varvec{p}})-{\varvec{p}} {\varvec{p}}^T\right) \, (\nabla h_1({\varvec{p}})-\nabla h_2({\varvec{p}}))\right) ,\qquad \end{aligned}$$

(18)

where ${\varvec{p}}$, $\nabla h_1({\varvec{p}})$, and $\nabla h_2({\varvec{p}})$ are given in Theorem 1.

From the Theorem 1 and Corollary 1, the large-sample confidence intervals for $\rho ^{2}_{_{({X \rightarrow Y})}}$, $\rho ^{2}_{_{({Y \rightarrow X})}}$, and ($\rho ^{2}_{_{({X \rightarrow Y})}} - \rho ^{2}_{_{({Y \rightarrow X})}}$) can be obtained.

1.2 2 Sensitivity analysis results

The following are figures for the results of the sensitivity analysis in Sect. 4.2 (see Figs. 3 and 4).

1.3 3 Diagnostic Plots

Here, we provide history plots, autocorrelation function and partial autocorrelation function plots for select parameters in both the simulation studies and real data studies (Figs. 5, 6, 7). See Sect. 3.4 for details. Plots for additional parameters are provided in Section S2 of the Supplementary Material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wei, Z., Kim, D. & Conlon, E.M. A Bayesian approach to the analysis of asymmetric association for two-way contingency tables. Comput Stat 37, 1311–1338 (2022). https://doi.org/10.1007/s00180-021-01161-9

Download citation

Received: 29 November 2020
Accepted: 21 September 2021
Published: 25 October 2021
Issue Date: July 2022
DOI: https://doi.org/10.1007/s00180-021-01161-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian approach to the analysis of asymmetric association for two-way contingency tables

Abstract

Access this article

Similar content being viewed by others

Improving the power of hypothesis tests in sparse contingency tables

New Confidence Intervals for Relative Risk of Two Correlated Proportions

A semi-parametric approach to impute mixed continuous and categorical data

References