Abstract
The availability of high density single nucleotide polymorphisms (SNPs) data has made genome-wide association study computationally challenging. Two-locus epistasis (gene-gene interaction) detection has attracted great research interest as a promising method for genetic analysis of complex diseases. In this paper, we propose a general approach, COE, for efficient large scale gene-gene interaction analysis, which supports a wide range of tests. In particular, we show that many commonly used statistics are convex functions. From the observed values of the events in two-locus association test, we can develop an upper bound of the test value. Such an upper bound only depends on single-locus test and the genotype of the SNP-pair. We thus group and index SNP-pairs by their genotypes. This indexing structure can benefit the computation of all convex statistics. Utilizing the upper bound and the indexing structure, we can prune most of the SNP-pairs without compromising the optimality of the result. Our approach is especially efficient for large permutation test. Extensive experiments demonstrate that our approach provides orders of magnitude performance improvement over the brute force approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Balding, D.J.: A tutorial on statistical methods for population association studies. Nature Reviews Genetics 7(10), 781–791 (2006)
Bohringer, S., Hardt, C., Miterski, B., Steland, A., Epplen, J.T.: Multilocus statistics to uncover epistasis and heterogeneity in complex diseases: revisiting a set of multiple sclerosis data. European Journal of Human Genetics 11, 573–584 (2003)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Carlborg, O., Andersson, L., Kinghom, B.: The use of a genetic algorithm for simultaneous mapping of multiple interacting quantitative trait loci. Genetics 155, 2003–2010 (2000)
Carlson, C.S., Eberle, M.A., Kruglyak, L., Nickerson, D.A.: Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004)
Chi, P.B., et al.: Comparison of snp tagging methods using empirical data: association study of 713 snps on chromosome 12q14.3-12q24.21 for asthma and total serum ige in an african caribbean population. Genet. Epidemiol. 30(7), 609–619 (2006)
Cordell, H.J.: Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Human Molecular Genetics 11(20), 2463–2468 (2002)
Doerge, R.W.: Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics 3, 43–52 (2002)
Dong, C., et al.: Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics 16, 229–235 (2008)
Erlichman, C., Sargent, D.J.: New treatment options for colorectal cancer. N. Engl. J. Med. 351, 391–392 (2004)
Evans, D.M., Marchini, J., Morris, A.P., Cardon, L.R.: Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006)
Halperin, E., Kimmel, G., Shamir, R.: Tag snp selection in genotype data for maximizing snp prediction accuracy. In: Proc. ISMB (2005)
Herbert, A., et al.: A common genetic variant is associated with adult and childhood obesity. Science 312, 279–284 (2006)
Hoh, J., Ott, J.: Mathematical multi-locus approaches to localizing complex human trait genes. Nature Reviews Genetics 4, 701–709 (2003)
Kirman, I., Huang, E.H., Whelan, R.L.: B cell response to tumor antigens is associated with depletion of b progenitors in murine colocarcinoma. Surgery 135, 313–318 (2004)
Nelson, M.R., Kardia, S.L., Ferrell, R.E., Sing, C.F.: A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Research 11, 458–470 (2001)
Ozaki, K., et al.: Functional snps in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002)
Pagano, M., Gauvreau, K.: Principles of Biostatistics. Duxbury Press, Pacific Grove (2000)
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. American Journal of Human Genetics 69, 138–147 (2001)
Roberts, A., McMillan, L., Wang, W., Parker, J., Rusyn, I., Threadgill, D.: Inferring missing genotypes in large snp panels using fast nearest-neighbor searches over sliding windows. In: Proc. ISMB (2007)
Roses, A.: The genome era begins. Nat. Genet. 33(suppl. 2), 217 (2003)
Ruivenkamp, C.A., Csikos, T., Klous, A.M., van Wezel, T., Demant, P.: Five new mouse susceptibility to colon cancer loci, scc11-scc15. Oncogene. 22, 7258–7260 (2003)
Saxena, R., et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007)
Scuteri, A., et al.: Genome-wide association scan shows genetic variants in the fto gene are associated with obesity-related traits. PLoS Genet. 3(7) (2007)
Sebastiani, P., Lazarus, R., Weiss, S.T., Kunkel, L.M., Kohane, I.S., Ramoni, M.F.: Minimal haplotype tagging. Proc. Natl. Acad. Sci. USA 100(17), 9900–9905 (2003)
Segré, D., DeLuna, A., Church, G.M., Kishony, R.: Modular epistasis in yeast metabolism. Nat. Genet. 37, 77–83 (2005)
Storey, J., Akey, J., Kruglyak, L.: Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology 8, e267 (2005)
Thomas, D.C.: Statistical methods in genetic epidemiology. Oxford Univeristy Press, Oxford (2004)
Wade, C.M., Daly, M.J.: Genetic variation in laboratory mice. Nat. Genet. 3737, 1175–1180 (2005)
Weedon, M.N., et al.: A common variant of hmga2 is associated with adult and childhood height in the general population. Nat. Genet. 39, 1245–1250 (2007)
Zhang, X., Zou, F., Wang, W.: Fastanova: an efficient algorithm for genome-wide association study. In: KDD (2008)
Zhang, X., Zou, F., Wang, W.: FastChi: an efficient algorithm for analyzing gene-gene interactions. In: PSB (2009)
Zhao, J., Boerwinkle, E., Xiong, M.: An entropy-based statistic for genomewide association studies. Am. J. Hum. Genet. 77, 27–40 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Pan, F., Xie, Y., Zou, F., Wang, W. (2009). COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-02008-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02007-0
Online ISBN: 978-3-642-02008-7
eBook Packages: Computer ScienceComputer Science (R0)