Genomic mining for complex disease traits with “random chemistry”

Eppstein, Margaret J.; Payne, Joshua L.; White, Bill C.; Moore, Jason H.

doi:10.1007/s10710-007-9039-5

Genomic mining for complex disease traits with “random chemistry”

Original paper
Published: 04 October 2007

Volume 8, pages 395–411, (2007)
Cite this article

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Margaret J. Eppstein¹,
Joshua L. Payne²,
Bill C. White³ &
…
Jason H. Moore³

220 Accesses
20 Citations
2 Altmetric
Explore all metrics

Abstract

Our rapidly growing knowledge regarding genetic variation in the human genome offers great potential for understanding the genetic etiology of disease. This, in turn, could revolutionize detection, treatment, and in some cases prevention of disease. While genes for most of the rare monogenic diseases have already been discovered, most common diseases are complex traits, resulting from multiple gene–gene and gene-environment interactions. Detecting epistatic genetic interactions that predispose for disease is an important, but computationally daunting, task currently facing bioinformaticists. Here, we propose a new evolutionary approach that attempts to hill-climb from large sets of candidate epistatic genetic features to smaller sets, inspired by Kauffman’s “random chemistry” approach to detecting small auto-catalytic sets of molecules from within large sets. Although the algorithm is conceptually straightforward, its success hinges upon the creation of a fitness function able to discriminate large sets that contain subsets of interacting genetic features from those that don’t. Here, we employ an approximate and noisy fitness function based on the ReliefF data mining algorithm. We establish proof-of-concept using synthetic data sets, where individual features have no marginal effects. We show that the resulting algorithm can successfully detect epistatic pairs from up to 1,000 candidate single nucleotide polymorphisms in time that is linear in the size of the initial set, although success rate degrades as heritability declines. Research continues into seeking a more accurate fitness approximator for large sets and other algorithmic improvements that will enable us to extend the approach to larger data sets and to lower heritabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19

Article 12 April 2024

Artificial intelligence and machine learning in precision and genomic medicine

Article 15 June 2022

Artificial Intelligence and Personalized Medicine

References

Barrett, H.H., Myers, K.J.: Foundations of Image Science. John Wiley & Sons, Inc., New Jersey (2004)
Google Scholar
Culverhouse, R., Suarez, B.K., Lin, J., Reich, T.: A perspective on epistasis: Limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002)
Article Google Scholar
Glazier, A.M., Nadeau, J.H., Aitman, T.J.: Finding genes that underlie complex traits. Science 298, 2345–2349 (2002)
Article Google Scholar
Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005)
Article Google Scholar
Hoh, J., Wille, A., Ott, J.: Trimming, weighting, and grouping SNPs in human case-control association studies. Gen. Res. 11, 2115–2119 (2001)
Article Google Scholar
International HapMap Consortium: The international HapMap project. Nature 426, 789–796 (2003)
Article Google Scholar
International human genome sequencing consortium: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)
Article Google Scholar
International SNP map working group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001)
Article Google Scholar
Kauffman, S.: At Home in the Universe: The Search for the Laws of Self-Organization and Complexity. Oxford Univ. Press, USA (1996)
Google Scholar
Kruglyak, L., Nickerson, D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)
Article Google Scholar
Lucek, P.R., Ott, J.: Neural network analysis of complex traits. Gen. Epidem. 14, 1101–1106 (1997)
Article Google Scholar
McKinney, B.A., Reif, D.M., Ritchie, M.D., Moore, J.H.: Machine learning for detecting gene-gene interactions. Appl. Bioinformatics 5, 77–88 (2006)
Article Google Scholar
Merikangas, K.R., Low, N.C.P, Hardy, J.: Understanding sources of complexity in chronic diseases—the importance of integration of genetics and epidemiology. Int. J. Epidemiol. 35, 590–592 (2006)
Article Google Scholar
Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)
Article Google Scholar
Moore, J.H.: Computational analysis of gene-gene interactions in common human diseases using multifactor dimensionality reduction. Expert Rev. Mol. Diagn. 4, 795–803 (2004)
Article Google Scholar
Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N, White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241, 252–261 (2006)
Article MathSciNet Google Scholar
Moore, J.H., Ritchie, M.D.: The challenges of whole-genome approaches to common diseases. JAMA 291, 1642–1643 (2002)
Article Google Scholar
Moore J.H., White B.C.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Riolo, R.L., Soule, T., Worzel, B. (eds.) Genetic Programming Theory and Practice IV. Springer, New York (2006)
Moore J.H., White B.C.: Tuning ReliefF for genome-wide genetic analysis. In: Rajapakse, J.C. et al. (eds.) Lecture Notes in Computer Science, 4447, pp. 166–175, Springer, New York (2007)
Ott, J., Hoh, J.: Statistical multilocus methods for disequilibrium analysis in complex traits. Hum. Mut. 17, 285–288 (2001)
Article Google Scholar
Peltonen, L., McKusick, V.A.: Dissecting human disease in the postgenomic era. Science 291, 1224–1229 (2001)
Article Google Scholar
Proulx, S.R., Phillips, P.C.: The opportunity for canalization and the evolution of genetic networks. Am. Nat. 165, 147–162 (2005)
Article Google Scholar
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N., Barabasi, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002)
Article Google Scholar
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor dimensionality reduction reveals high-order interactions among estrogen metabolism genes in sporadic breast cancer. Am. J. Hum. Gen. 69, 138–147 (2001)
Article Google Scholar
Robnik-Sikonja, M., Konenenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 53, 23–69 (2003)
Article MATH Google Scholar
Syvanen, A.C.: Accessing genetic variation: genotyping single nucleotide polymorphisms. Nat. Rev. Genet. 2, 930–942 (2001)
Article Google Scholar
Thornton-Wells, T.A., Moore, J.H., Haines, J.L.: Genetics, statistics and human disease: analytical retooling for complexity. Trends Genet. 20, 640–647 (2004)
Article Google Scholar
Tong, A.H. et al.: Global mapping of the yeast genetic interaction network. Science 303, 808–813 (2004)
Article Google Scholar
Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)
Article Google Scholar
Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-wide association studies: theoretical and practical concerns. Nat. Rev. Genet. 6, 109–118 (2005)
Article Google Scholar
White, B.C., Gilbert, J.C., Reif, D.M., Moore, J.H.: A statistical comparison of grammatical evolution strategies in the domain of human genetics. In: Corne, D. et al (eds.) Proc. of the IEEE Congress on Evol. Computing pp. 676–682. IEEE Press, Edinburgh, UK, (2005)
Chapter Google Scholar

Download references

Acknowledgments

This work was supported, in part, by a pilot award and a graduate research assistantship, funded by DOE-FG02-00ER45828 awarded by the US Department of Energy through its EPSCoR Program and by National Institutes of Health grants AI59694 and LM009012. We thank Joshua Gilbert for his aid in creating the synthetic data sets.

Author information

Authors and Affiliations

Departments of Computer Science and Biology, University of Vermont, Burlington, VT, 05405, USA
Margaret J. Eppstein
Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Joshua L. Payne
Computational Genetics Laboratory, Dartmouth College, Lebanon, NH, 03756, USA
Bill C. White & Jason H. Moore

Authors

Margaret J. Eppstein
View author publications
You can also search for this author in PubMed Google Scholar
Joshua L. Payne
View author publications
You can also search for this author in PubMed Google Scholar
Bill C. White
View author publications
You can also search for this author in PubMed Google Scholar
Jason H. Moore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margaret J. Eppstein.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eppstein, M.J., Payne, J.L., White, B.C. et al. Genomic mining for complex disease traits with “random chemistry”. Genet Program Evolvable Mach 8, 395–411 (2007). https://doi.org/10.1007/s10710-007-9039-5

Download citation

Received: 14 June 2007
Revised: 14 June 2007
Accepted: 24 July 2007
Published: 04 October 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s10710-007-9039-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genomic mining for complex disease traits with “random chemistry”

Abstract

Access this article

Similar content being viewed by others

VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19

Artificial intelligence and machine learning in precision and genomic medicine

Artificial Intelligence and Personalized Medicine

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Genomic mining for complex disease traits with “random chemistry”

Abstract

Access this article

Similar content being viewed by others

VARista: a free web platform for streamlined whole-genome variant analysis across T2T, hg38, and hg19

Artificial intelligence and machine learning in precision and genomic medicine

Artificial Intelligence and Personalized Medicine

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation