Abstract
Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionize detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene–gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman’s “random chemistry” approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don’t. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.
Similar content being viewed by others
References
Barrett, H.H., Myers, K.J.: Foundations of Image Science. John Wiley & Sons, Inc., New Jersey (2004)
Culverhouse, R., Suarez, B.K., Lin, J., Reich, T.: A perspective on epistasis: Limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002)
Glazier, A.M., Nadeau, J.H., Aitman, T.J.: Finding genes that underlie complex traits. Science 298, 2345–2349 (2002)
Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005)
Hoh, J., Wille, A., Ott, J.: Trimming, weighting, and grouping SNPs in human case-control association studies. Gen. Res. 11, 2115–2119 (2001)
International HapMap Consortium: The international HapMap project. Nature 426, 789–796 (2003)
International human genome sequencing consortium: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
International SNP map working group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)
Kauffman, S.: At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford Univ. Press, USA (1996)
Kruglyak, L., Nickerson, D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)
Lucek, P.R., Ott, J.: Neural network analysis of complex traits. Gen. Epidem. 14, 1101–1106 (1997)
McKinney, B.A., Reif, D.M., Ritchie, M.D., Moore, J.H.: Machine learning for detecting gene-gene interactions. Appl. Bioinformatics 5, 77–88 (2006)
Merikangas, K.R., Low, N.C.P, Hardy, J.: Understanding sources of complexity in chronic diseases—the importance of integration of genetics and epidemiology. Int. J. Epidemiol. 35, 590–592 (2006)
Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)
Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004)
Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N, White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)
Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2002)
Moore J.H., White B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV. Springer, New York (2006)
Moore J.H., White B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Rajapakse, J.C. et al. (eds.) Lecture Notes in Computer Science, 4447, pp. 166–175, Springer, New York (2007)
Ott, J., Hoh, J.: Statistical multilocus methods for disequilibrium analysis in complex traits. Hum. Mut. 17, 285–288 (2001)
Peltonen, L., McKusick, V.A.: Dissecting human disease in the postgenomic era. Science 291, 1224–1229 (2001)
Proulx, S.R., Phillips, P.C.: The opportunity for canalization and the evolution of genetic networks. Am. Nat. 165, 147–162 (2005)
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabasi, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002)
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Gen. 69, 138–147 (2001)
Robnik-Sikonja, M., Konenenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)
Syvanen, A.C.: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat. Rev. Genet. 2, 930–942 (2001)
Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet. 20, 640–647 (2004)
Tong, A.H. et al.: Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004)
Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)
Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005)
White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H.: A statistical comparison of grammatical evolution strategies in the domain of human genetics. In: Corne, D. et al (eds.) Proc. of the IEEE Congress on Evol. Computing pp. 676–682. IEEE Press, Edinburgh, UK, (2005)
Acknowledgments
This work was supported, in part, by a pilot award and a graduate research assistantship, funded by DOE-FG02-00ER45828 awarded by the US Department of Energy through its EPSCoR Program and by National Institutes of Health grants AI59694 and LM009012. We thank Joshua Gilbert for his aid in creating the synthetic data sets.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eppstein, M.J., Payne, J.L., White, B.C. et al. Genomic mining for complex disease traits with “random chemistry”. Genet Program Evolvable Mach 8, 395–411 (2007). https://doi.org/10.1007/s10710-007-9039-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-007-9039-5