A markov chain sampler for contingency table exact inference

Yuan, Ao; Yang, Yimin

doi:10.1007/BF02736123

A markov chain sampler for contingency table exact inference

Published: 01 March 2005

Volume 20, pages 63–80, (2005)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ao Yuan¹ &
Yimin Yang²

131 Accesses
Explore all metrics

Summary

In the inference of contingency table, when the cell counts are not large enough for asymptotic approximation, conditioning exact method is used and often computationally impractical for large tables. Instead, various sampling methods can be used. Based on permutation, the Monte Carlo sampling may become again impractical for large tables. For this, existing the Markov chain method is to sample a few elements of the table at each iteration and is inefficient. Here we consider a Markov chain, in which a sub-table of user specified size is updated at each iteration, and it achieves high sampling efficiency. Some theoretical properties of the chain and its applications to some commonly used tables are discussed. As an illustration, this method is applied to the exact test of the Hardy-Weinberg equilibrium in the population genetics context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Informed sub-sampling MCMC: approximate Bayesian inference for large datasets

Article 09 June 2018

Random sampling of contingency tables via probabilistic divide-and-conquer

Article 04 June 2019

Hybrid schemes for exact conditional inference in discrete exponential families

Article 04 September 2017

References

Agresti, A. 1990,Categorical Data Analysis, John Wiley Sons.
Baglivo, J., Oliver, D. Pagano, M. 1988, ‘Methods for the Analysis of Contingency Tables with Large and Small Cell Counts’,Journal of the American statistical Association 83, 1006–1013.
Article MathSciNet Google Scholar
Bishop, Y., Feinberg, S. Pagano, M. 1975,Discrete Multivariate Analysis, Cambridge, MA: MIT Press.
Google Scholar
Blackman, R. Tukey, J. 1958,The Measurement of Power Spectra, New York: Dover.
MATH Google Scholar
Chung, K. 1960,Markov Processes with Stationary Transition Probabilities, Heidelberg: Springer-Verlag.
Book Google Scholar
Cox, M. Plackett, R. 1980, ‘Small samples in contingency tables’Biometrika 67, 1–13.
Article MathSciNet Google Scholar
Diaconis, P. Stroock, D. 1991, ‘Geometric bounds for eigenvalues of Markov chains’The Annals of Applied Probability 1, 36–61.
Article MathSciNet Google Scholar
Fill, J. 1991, ‘Eigenvalue bounds on convergence to stationary for nonreversible Markov chains, with an application to the exclusion process’The Annals of Applied Probability 1, 62–87.
Article MathSciNet Google Scholar
Fisher, R. 1925,Statistical Methods for Research Workers, 13th Edition, Hafner, New York.
MATH Google Scholar
Fisher, R. 1935,The Design of Experiments, (1st ed.), Edingurgh, London: Oliver E. Boyd, (7th edition 1960).
Google Scholar
Gelman, A. Rubin, D. 1992, ‘Inference from iterative simulation using multiple sequences’Statistical Science 7, 457–472.
Article Google Scholar
Geman, S. Geman, D. 1984, ‘Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images’IEEE transactions on Pattern Analysis and Machine Intelligence PAMI-6, 721–741.
Article Google Scholar
Gewek, J. 1992, ‘Evaluating the accuracy of sampling-based approaches to calculating posterior moments’ inBayesian Statistics 4, (ed. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith). Clarendon Press, Oxford, UK.
Google Scholar
Guo, S. Thompson, E. 1992, ‘Performing the exact test of Hardy-Weinberg proportion for multiple alleles’Biometrics 48, 361–372.
Article Google Scholar
Hannan, E. 1957, ‘The variance of the mean of a stationary process’Journal of the Royal Statistical Society, B.19, 282–285.
MathSciNet MATH Google Scholar
Hastings, W. 1970, ‘Monte Carlo sampling methods using Markov chains and their applications’Biometrika 57, 97–109.
Article MathSciNet Google Scholar
Heidelberger, P. Welch, P. 1983, ‘Simulation run length control in the presence of an initial transient’Operations Research 31, 1109–1144.
Article Google Scholar
Hernandez, J. Weir, B. 1989, ‘A disequilibrium coefficient approach to Hardy-Weinberg testing’Biometrics 45, 53–70.
Article MathSciNet Google Scholar
Jowett, G. 1955, ‘The comparison of means of sets of observations from sections of independent stochastic series’Journal of the Royal Statistical Society, B.17, 208–227.
MathSciNet MATH Google Scholar
Kolassa, J. Tanner, M. 1994, ‘Approximate conditional inference in exponential families via the Gibbs sampler’Journal of the American Statistical association 89, 697–702.
Article MathSciNet Google Scholar
Lange, K. 1997,Mathematical and Statistical Methods for Genetic Analysis, Springer.
Levine, H. 1970, ‘On a matching problem arising in genetics’Annals of Mathematical Statistics 20, 91–94.
Article MathSciNet Google Scholar
Liu, J., Wong, W. Kong, A. 1995, ‘Covariance structure and convergence rate of the Gibbs sampler with various scans’Journal of the Royal Statistical Society, B.57, 157–169.
MathSciNet MATH Google Scholar
Louis, E. Dempster, E. 1987, ‘An exact test for Hardy-Weinberg and multiple alleles’Biometrics 43, 805–811.
Article Google Scholar
Mehta, C. Patel, N. 1983, ‘A network algorithm for performing Fisher’s exact test in r × c contingency tables’Journal of the American Statistical Association 78, 427–434.
MathSciNet MATH Google Scholar
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. Teller, E. 1953, ‘Equations of state calculations by fast computing machines’Journal of Chemical Physics 21, 1087–1092.
Article Google Scholar
Pagano, M. Halvorsen, K. 1981, ‘An algorithm for finding the exact significance levels for r × c contingency tables’Journal of the American Statistical Association 76, 931–934.
MathSciNet MATH Google Scholar
Patefield, W. 1981, ‘An efficient method of generating randomr × c tables with given row and column totals (Algorithm AS 159)’Applied Statistics 30, 91–97.
Article Google Scholar
Raftery, A. Leiws, S. 1992, ‘How many iterations in the Gibbs sampler?’ inBayesian Statistics 4, (ed. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), 763–774. Clarendon Press, Oxford, UK.
Google Scholar
Skovgaard, I. 1987, ‘Saddlepoint expansions for conditional distributions’Journal of Applied Probability 24, 875–887.
Article MathSciNet Google Scholar
Stout, W. 1974,Almost Sure Convergence, Academic Press.

Download references

Acknowledgements

We thank the editor and the two referees whose comments/suggestions improved the quality of the paper.

Author information

Authors and Affiliations

Statistical Genetics and Bioinformatics Unit, National Human Genome Center, Howard University, 20059, Washington DC, USA
Ao Yuan
Department of Mathematics and Physics, Beijing Technology and Business University, 100037, Beijing, PR China
Yimin Yang

Authors

Ao Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Yimin Yang
View author publications
You can also search for this author inPubMed Google Scholar

Appendix: Proof of the Theorem

(i)
Irreducibility: We need to show that ∀T, T′ ∈ Γ₀, T′ can be obtained from T through finite number of transitions and P(T′|T ) > 0 (Hastings, 1970). Let (i₁, i₂, j₁, j₂) be the basic move which decrease 1 count at the positions (i₁, j₁) and (i₂, j₂) each, and increase 1 count at the positions (i₁, j₂) and (i₂, j₁) each. Apparently, any of the basic moves keeps the boundary condition unchanged. In case of row plus column constraint with sub-table requirement (b), and other cases with sub-table requirement (a), T′ can be obtained by finite number $ S=\frac{1}{4}\sum_{ij}|T_{ij}^{'}-T_{ij}| $ of basic moves e_i’s from T, each basic move involves four count changes in four positions in T, which can be covered by a sub-table D_i with dimension no less than two (in the row plus column constraint case, can be covered by at most two sub-tables with dimension no less than three), since P(D_i) > 0 and P(e_i|D_i) > 0, and the transition from T to T′ can possibly be achieved through some other ways, so
$$ P(T^{'}|T)\geq\prod_{i=1}^SP(D_i)P(e_i|D_i)>0. $$

Reversibility: For any T and T′ ∈ Г₀, by the proof above, there are finite number of intermediate states T₁,..., T_m−1, and sub-tables D₁,..., D_m, such that T, the T_i’s and T′ are only differ on a sub-table D_i, and T′ is obtained by successive transition from T via T₁, ..., T_m, let T₀ = T and T_m= T′. It is easy to check that $ P(T_{i-1})P(T_i|B_{D_i})=P(T_i)P(T_{i-1}|B_{D_i}), $ by the definition of the transition probability, so we have
$$ P(T)P(T^{'}|T)=P(T)\prod_{i=1}^mP(D_i)P(T_i|T_{i-1})=\prod_{i=1}^mP(D_i)P(T_{i-1})P(T_i|B_{D_i})\\=\prod_{i=1}^mP(D_i)P(T_{i-1})P(T_i|B_{D_i})=P(T^{'})\prod_{i=1}^mP(D_i)P(T_i|T_{i-1}|B_{D_i})=P(T^{'})P(T|T^{'}). $$

The reversibility ensures P(·) is the invariant distribution of the chain, the irreducibility ensures P(·) is the unique equilibrium distribution of the chain, and the state space of the chain is finite, thus (6) and (7) are direct results in Chung (1960, p.99).
(ii)
Liu, et al (1995) proved the results for Gibbs sampler, the same method applies here, and the required conditions (a), (b) and (c) in Liu, et al (1995) are satisfied by this sampler since the state space is finite and the chain is irreducible. Then define the forward operator and follow steps similarly there.
(iii)
since P(·) is an (initial) stationary distribution of the chain, E_P|g(T)l < ∞ and irreducibility of the chain is the same as Markov ergodic, Theorem 3.6.7 in Stout (1974) gives the result.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, A., Yang, Y. A markov chain sampler for contingency table exact inference. Computational Statistics 20, 63–80 (2005). https://doi.org/10.1007/BF02736123

Download citation

Published: 01 March 2005
Issue Date: March 2005
DOI: https://doi.org/10.1007/BF02736123

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A markov chain sampler for contingency table exact inference

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Informed sub-sampling MCMC: approximate Bayesian inference for large datasets

Random sampling of contingency tables via probabilistic divide-and-conquer

Hybrid schemes for exact conditional inference in discrete exponential families

References

Acknowledgements

Author information

Authors and Affiliations

Appendix: Proof of the Theorem

Appendix: Proof of the Theorem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now