Skip to main content

Advertisement

Log in

Genomic mining for complex disease traits with “random chemistry”

  • Original paper
  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionize detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene–gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman’s “random chemistry” approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don’t. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Barrett, H.H., Myers, K.J.: Foundations of Image Science. John Wiley & Sons, Inc., New Jersey (2004)

    Google Scholar 

  2. Culverhouse, R., Suarez, B.K., Lin, J., Reich, T.: A perspective on epistasis: Limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002)

    Article  Google Scholar 

  3. Glazier, A.M., Nadeau, J.H., Aitman, T.J.: Finding genes that underlie complex traits. Science 298, 2345–2349 (2002)

    Article  Google Scholar 

  4. Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005)

    Article  Google Scholar 

  5. Hoh, J., Wille, A., Ott, J.: Trimming, weighting, and grouping SNPs in human case-control association studies. Gen. Res. 11, 2115–2119 (2001)

    Article  Google Scholar 

  6. International HapMap Consortium: The international HapMap project. Nature 426, 789–796 (2003)

    Article  Google Scholar 

  7. International human genome sequencing consortium: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Article  Google Scholar 

  8. International SNP map working group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)

    Article  Google Scholar 

  9. Kauffman, S.: At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford Univ. Press, USA (1996)

    Google Scholar 

  10. Kruglyak, L., Nickerson, D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)

    Article  Google Scholar 

  11. Lucek, P.R., Ott, J.: Neural network analysis of complex traits. Gen. Epidem. 14, 1101–1106 (1997)

    Article  Google Scholar 

  12. McKinney, B.A., Reif, D.M., Ritchie, M.D., Moore, J.H.: Machine learning for detecting gene-gene interactions. Appl. Bioinformatics 5, 77–88 (2006)

    Article  Google Scholar 

  13. Merikangas, K.R., Low, N.C.P, Hardy, J.: Understanding sources of complexity in chronic diseases—the importance of integration of genetics and epidemiology. Int. J. Epidemiol. 35, 590–592 (2006)

    Article  Google Scholar 

  14. Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)

    Article  Google Scholar 

  15. Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004)

    Article  Google Scholar 

  16. Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N, White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)

    Article  MathSciNet  Google Scholar 

  17. Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2002)

    Article  Google Scholar 

  18. Moore J.H., White B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV. Springer, New York (2006)

  19. Moore J.H., White B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Rajapakse, J.C. et al. (eds.) Lecture Notes in Computer Science, 4447, pp. 166–175, Springer, New York (2007)

  20. Ott, J., Hoh, J.: Statistical multilocus methods for disequilibrium analysis in complex traits. Hum. Mut. 17, 285–288 (2001)

    Article  Google Scholar 

  21. Peltonen, L., McKusick, V.A.: Dissecting human disease in the postgenomic era. Science 291, 1224–1229 (2001)

    Article  Google Scholar 

  22. Proulx, S.R., Phillips, P.C.: The opportunity for canalization and the evolution of genetic networks. Am. Nat. 165, 147–162 (2005)

    Article  Google Scholar 

  23. Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabasi, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002)

    Article  Google Scholar 

  24. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Gen. 69, 138–147 (2001)

    Article  Google Scholar 

  25. Robnik-Sikonja, M., Konenenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)

    Article  MATH  Google Scholar 

  26. Syvanen, A.C.: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat. Rev. Genet. 2, 930–942 (2001)

    Article  Google Scholar 

  27. Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet. 20, 640–647 (2004)

    Article  Google Scholar 

  28. Tong, A.H. et al.: Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004)

    Article  Google Scholar 

  29. Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)

    Article  Google Scholar 

  30. Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005)

    Article  Google Scholar 

  31. White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H.: A statistical comparison of grammatical evolution strategies in the domain of human genetics. In: Corne, D. et al (eds.) Proc. of the IEEE Congress on Evol. Computing pp. 676–682. IEEE Press, Edinburgh, UK, (2005)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported, in part, by a pilot award and a graduate research assistantship, funded by DOE-FG02-00ER45828 awarded by the US Department of Energy through its EPSCoR Program and by National Institutes of Health grants AI59694 and LM009012. We thank Joshua Gilbert for his aid in creating the synthetic data sets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margaret J. Eppstein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eppstein, M.J., Payne, J.L., White, B.C. et al. Genomic mining for complex disease traits with “random chemistry”. Genet Program Evolvable Mach 8, 395–411 (2007). https://doi.org/10.1007/s10710-007-9039-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-007-9039-5

Keywords

Navigation