Skip to main content

A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010)

Abstract

A goal of human genetics is to discover genetic factors that influence individuals’ susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variations and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate six-hundred pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variations have been minimized, while the predictiveness of third, fourth, or fifth order combinations is maximized. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This could improve our ability to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 56,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/ .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chanock, S.J., Manolio, T., Boehnke, M., Boerwinkle, E., Hunter, D.J., Thomas, G., Hirschhorn, J.N., Abecasis, G., Altshuler, D., Bailey-Wilson, J.E., Brooks, L.D., Cardon, L.R., Daly, M., Donnelly, P., Fraumeni, J.F., Freimer, N.B., Gerhard, D.S., Gunter, C., Guttmacher, A.E., Guyer, M.S., Harris, E.L., Hoh, J., Hoover, R., Kong, C.A., Merikangas, K.R., Morton, C.C., Palmer, L.J., Phimister, E.G., Rice, J.P., Roberts, J., Rotimi, C., Tucker, M.A., Vogan, K.J., Wacholder, S., Wijsman, E.M., Winn, D.M., Collins, F.S.: Replicating genotype-phenotype associations. Nature 447(7145), 655–660 (2007)

    Article  Google Scholar 

  2. McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)

    Article  Google Scholar 

  3. Hirschhorn, J.N., Lohmueller, K., Byrne, E., Hirschhorn, K.: A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002)

    Article  Google Scholar 

  4. Shriner, D., Vaughan, L.K., Padilla, M.A., Tiwari, H.K.: Problems with Genome-Wide association studies. Science 316(5833), 1840–1841 (2007)

    Article  Google Scholar 

  5. Williams, S.M., Canter, J.A., Crawford, D.C., Moore, J.H., Ritchie, M.D., Haines, J.L.: Problems with Genome-Wide association studies. Science 316(5833), 1841–1842 (2007)

    Google Scholar 

  6. Jakobsdottir, J., Gorin, M.B., Conley, Y.P., Ferrell, R.E., Weeks, D.E.: Interpretation of genetic association studies: Markers with replicated highly significant odds ratios may be poor classifiers. PLoS Genetics 5(2), e1000337 (2009)

    Article  Google Scholar 

  7. Templeton, A.: Epistasis and complex traits. In: Epistasis and the Evolutionary Process, pp. 41–57 (2000)

    Google Scholar 

  8. Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity 56, 73–82 (2003)

    Article  Google Scholar 

  9. Moore, J.H., Williams, S.M.: Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays 27(6), 637–646 (2005)

    Article  Google Scholar 

  10. Greene, C.S., Penrod, N.M., Williams, S.M., Moore, J.H.: Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS ONE 4(6), e5639 (2009)

    Article  Google Scholar 

  11. Tyler, A.L., Asselbergs, F.W., Williams, S.M., Moore, J.H.: Shadows of complexity: what biological networks reveal about epistasis and pleiotropy. BioEssays 31(2), 220–227 (2009)

    Article  Google Scholar 

  12. Shao, H., Burrage, L.C., Sinasac, D.S., Hill, A.E., Ernest, S.R., O’Brien, W., Courtland, H., Jepsen, K.J., Kirby, A., Kulbokas, E.J., Daly, M.J., Broman, K.W., Lander, E.S., Nadeau, J.H.: Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis. Proceedings of the National Academy of Sciences 105(50), 19910–19914 (2008)

    Article  Google Scholar 

  13. Freitas, A.A.: Understanding the crucial role of attribute interaction in data mining. Artif. Intell. Rev. 16(3), 177–199 (2001)

    Article  MATH  Google Scholar 

  14. Moore, J.H., Ritchie, M.D.: The challenges of Whole-Genome approaches to common diseases. JAMA 291(13), 1642–1643 (2004)

    Article  Google Scholar 

  15. Velez, D.R., White, B.C., Motsinger, A.A., Bush, W.S., Ritchie, M.D., Williams, S.M., Moore, J.H.: A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genetic Epidemiology 31(4), 306–315 (2007)

    Article  Google Scholar 

  16. Hoffmeister, F., Bäck, T.: Genetic algorithms and evolution strategies - similarities and differences. In: Schwefel, H.-P., Männer, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 455–469. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  17. Bäck, T., Hoffmeister, F., Schwefel, H.: A survey of evolution strategies. In: Proceedings of the Fourth International Conference on Genetic Algorithms, pp. 2–9 (1991)

    Google Scholar 

  18. Goldberg, D.E.: The Design of Innovation: Lessons from and for Competent Genetic Algorithms. Kluwer Academic Publishers, Norwell (2002)

    MATH  Google Scholar 

  19. Greenwood, G., Shin, J.: On the evolutionary search for solutions to the protein folding problem. In: Fogel, G., Corne, D. (eds.) Evolutionary Computation in Bioinformatics, pp. 115–136. Elsevier Science, Amsterdam (2003)

    Chapter  Google Scholar 

  20. van Hemert, J.I.: Property analysis of symmetric travelling salesman problem instances acquired through evolution. In: Raidl, G.R., Gottlieb, J. (eds.) EvoCOP 2005. LNCS, vol. 3448, pp. 122–131. Springer, Heidelberg (2005)

    Google Scholar 

  21. van Hemert, J.I.: Evolving combinatorial problem instances that are difficult to solve. Evolutionary Computation 14(4), 433–462 (2006)

    Article  Google Scholar 

  22. Julstrom, B.A.: Evolving heuristically difficult instances of combinatorial problems. In: GECCO 2009: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pp. 279–286. ACM, New York (2009)

    Chapter  Google Scholar 

  23. Schaffer, J.D.: Multiple objective optimization with vector evaluated genetic algorithms. In: Proceedings of the 1st International Conference on Genetic Algorithms, pp. 93–100. L. Erlbaum Associates Inc., Hillsdale (1985)

    Google Scholar 

  24. Richardson, J.T., Palmer, M.R., Liepins, G.E., Hilliard, M.: Some guidelines for genetic algorithms with penalty functions. In: Proceedings of the third international conference on Genetic algorithms, pp. 191–197. Morgan Kaufmann Publishers Inc., San Francisco (1989)

    Google Scholar 

  25. Goldberg, D.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)

    MATH  Google Scholar 

  26. Fonseca, C.M., Fleming, P.J.: An overview of evolutionary algorithms in multiobjective optimization. Evolutionary Computation 3, 1–16 (1995)

    Article  Google Scholar 

  27. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)

    Article  Google Scholar 

  28. Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241(2), 252–261 (2006)

    Article  MathSciNet  Google Scholar 

  29. Hartl, D.L., Clark, A.G.: Principles of Population Genetics, 3rd edn. Sinauer Associates, Sunderland (1997)

    Google Scholar 

  30. Hosking, L., Lumsden, S., Lewis, K., Yeo, A., McCarthy, L., Bansal, A., Riley, J., Purvis, I., Xu, C.: Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur. J. Hum. Genet. 12(5), 395–399 (2004)

    Article  Google Scholar 

  31. Xu, J., Turner, A., Little, J., Bleecker, E., Meyers, D.: Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Human Genetics 111(6), 573–574 (2002)

    Article  Google Scholar 

  32. Ryckman, K.K., Jiang, L., Li, C., Bartlett, J., Haines, J.L., Williams, S.M.: A prevalence-based association test for case-control studies. Genetic Epidemiology 32(7), 600–605 (2008)

    Article  Google Scholar 

  33. Moore, J.H., Hahn, L.W., Ritchie, M.D., Thornton, T.A., White, B.C.: Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1150–1155. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  34. Moore, J.H., Hahn, L.W., Ritchie, M.D., Thornton, T.A., White, B.C.: Routine discovery of complex genetic models using genetic algorithms. Applied Soft Computing 4(1), 79–86 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Greene, C.S., Himmelstein, D.S., Moore, J.H. (2010). A Model Free Method to Generate Human Genetics Datasets with Complex Gene-Disease Relationships. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12211-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12210-1

  • Online ISBN: 978-3-642-12211-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics