Skip to main content

Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2008)

Abstract

Whole-genome association (WGA) studies are becoming a common tool for the exploration of the genetic components of common disease. The analysis of such large scale data presents unique analytical challenges, including problems of multiple testing, correlated independent variables, and large multivariate model spaces. These issues have prompted the development of novel computational approaches. Thorough, extensive simulation studies are a necessity for methods development work to evaluate the power and validity of novel approaches. Many data simulation packages exist, however, the resulting data is often overly simplistic and does not compare to the complexity of real data; especially with respect to linkage disequilibrium (LD). To overcome this limitation, we have developed genomeSIMLA. GenomeSIMLA is a forward-time population simulation method that can simulate realistic patterns of LD in both family-based and case-control datasets. In this manuscript, we demonstrate how LD patterns of the simulated data change under different population growth curve parameter initialization settings. These results provide guidelines to simulate WGA datasets whose properties resemble the HapMap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The International HapMap Project. Nature, 426, 789-796 (2003)

    Google Scholar 

  2. Dudek, S., Motsinger, A.A., Velez, D., Williams, S.M., Ritchie, M.D.: Data simulation software for whole-genome association and other studies in human genetics. In: Pac Symp Biocomput, pp. 499–510 (2006)

    Google Scholar 

  3. Barrett, J.C., Cardon, L.R.: Evaluating coverage of genome-wide association studies. Nat Genet 38, 659–662 (2006)

    Article  Google Scholar 

  4. Hunter, D.J., Kraft, P., Jacobs, K.B., Cox, D.G., Yeager, M., Hankinson, S.E., et al.: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet, Jul 1939, 870–874 (2007)

    Google Scholar 

  5. Easton, D.F., Pooley, K.A., Dunning, A.M., Pharoah, P.D., Thompson, D., Ballinger, D.G., et al.: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007)

    Article  Google Scholar 

  6. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

    Google Scholar 

  7. Saxena, R., Voight, B.F., Lyssenko, V., Burtt, N.P., de Bakker, P.I., Chen, H., et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007)

    Google Scholar 

  8. Scott, L.J., Mohlke, K.L., Bonnycastle, L.L., Willer, C.J., Li, Y., Duren, W.L., et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007)

    Google Scholar 

  9. Zeggini, E., Weedon, M.N., Lindgren, C.M., Frayling, T.M., Elliott, K.S., Lango, H., et al.: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007)

    Google Scholar 

  10. Lyon, H.N., Emilsson, V., Hinney, A., Heid, I.M., Lasky-Su, J., Zhu, X., et al.: The association of a SNP upstream of INSIG2 with body mass index is reproduced in several but not all cohorts. PLoS Genet, e61 (2007)

    Google Scholar 

  11. Schmidt, M., Hauser, E.R., Martin, E.R., Schmidt, S.: Extension of the SIMLA package for generating pedigrees with complex inheritance patterns: environmental covariates, gene-gene and gene-environment interaction. Stat Appl Genet Mol Biol, 2005, Article15 (2004)

    Google Scholar 

  12. Bass, M.P., Martin, E.R., Hauser, E.R.: Pedigree generation for analysis of genetic linkage and association. In: Pac Symp Biocomput, pp. 93–103 (1993)

    Google Scholar 

  13. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37, 413–417 (2005)

    Article  Google Scholar 

  14. Boehnke, M.: Estimating the power of a proposed linkage study: a practical computer simulation approach. Am J Hum Genet 39, 513–527 (1986)

    Google Scholar 

  15. Ploughman, L.M., Boehnke, M.: Estimating the power of a proposed linkage study for a complex genetic trait. Am J Hum Genet 44, 543–551 (1989)

    Google Scholar 

  16. Weeks, D.E., Ott, J., Lathrop, G.M.: SLINK: A general simulation paorgram for linkage analysis. American Journal of Human Genetics 47, A204 (1990)

    Google Scholar 

  17. Kingman, J.: The coalescent. Stochastic Processes Appl 13, 235–248 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  18. Liang, L., Zollner, S., Abecasis, G.R.: GENOME: A rapid coalescent-based whole genome simulator. Bioinformatics 23, 1565–1567 (2007)

    Article  Google Scholar 

  19. Wright, F.A., Huang, H., Guan, X., Gamiel, K., Jeffries, C., Barry, W.T., et al.: Simulating association studies: A data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23, 2581–2588 (2007)

    Article  Google Scholar 

  20. Balloux, F.: EASYPOP (version 1.7): A computer program for population genetics simulations. J Hered 92, 301–302 (2001)

    Article  Google Scholar 

  21. Hey, J.: A computer program for forward population genetic simulation, Ref Type: Computer Program (2005)

    Google Scholar 

  22. Hoggart, C.J., Chadeau, M., Clark, T.G., Lampariello, R., De, I.M., Whittaker, J.C., et al.: Sequence-level population simulations over large genomic regions. Genetics (2007)

    Google Scholar 

  23. Peng, B., Kimmel, M.: simuPOP: A forward-time population genetics simulation environment. Bioinformatics (2005)

    Google Scholar 

  24. Moore, J.H., Hahn, L.W., Ritchie, M.D., Thornton, T.A., White, B.: Routine Discovery of High-Order Epistasis Models for Computational Studies in Human Genetics. Applied Soft Computing 4, 79–86 (2004)

    Article  Google Scholar 

  25. Richards, F.: A flexible growth function for empirical use. Journal of Experimental Botany 10, 290–300 (1959)

    Article  Google Scholar 

  26. Durrant, C., Zondervan, K.T., Cardon, L.R., Hunt, S., Deloukas, P., Morris, A.P.: Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 75, 35–43 (2004)

    Article  Google Scholar 

  27. Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12, 921–927 (1995)

    Google Scholar 

  28. Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet 2005, 449-462 (March, 1976)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Elena Marchiori Jason H. Moore

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Edwards, T.L. et al. (2008). Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA. In: Marchiori, E., Moore, J.H. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2008. Lecture Notes in Computer Science, vol 4973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78757-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78757-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78756-3

  • Online ISBN: 978-3-540-78757-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics