Abstract
Whole-genome association (WGA) studies are becoming a common tool for the exploration of the genetic components of common disease. The analysis of such large scale data presents unique analytical challenges, including problems of multiple testing, correlated independent variables, and large multivariate model spaces. These issues have prompted the development of novel computational approaches. Thorough, extensive simulation studies are a necessity for methods development work to evaluate the power and validity of novel approaches. Many data simulation packages exist, however, the resulting data is often overly simplistic and does not compare to the complexity of real data; especially with respect to linkage disequilibrium (LD). To overcome this limitation, we have developed genomeSIMLA. GenomeSIMLA is a forward-time population simulation method that can simulate realistic patterns of LD in both family-based and case-control datasets. In this manuscript, we demonstrate how LD patterns of the simulated data change under different population growth curve parameter initialization settings. These results provide guidelines to simulate WGA datasets whose properties resemble the HapMap.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The International HapMap Project. Nature, 426, 789-796 (2003)
Dudek, S., Motsinger, A.A., Velez, D., Williams, S.M., Ritchie, M.D.: Data simulation software for whole-genome association and other studies in human genetics. In: Pac Symp Biocomput, pp. 499–510 (2006)
Barrett, J.C., Cardon, L.R.: Evaluating coverage of genome-wide association studies. Nat Genet 38, 659–662 (2006)
Hunter, D.J., Kraft, P., Jacobs, K.B., Cox, D.G., Yeager, M., Hankinson, S.E., et al.: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet, Jul 1939, 870–874 (2007)
Easton, D.F., Pooley, K.A., Dunning, A.M., Pharoah, P.D., Thompson, D., Ballinger, D.G., et al.: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007)
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
Saxena, R., Voight, B.F., Lyssenko, V., Burtt, N.P., de Bakker, P.I., Chen, H., et al.: Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007)
Scott, L.J., Mohlke, K.L., Bonnycastle, L.L., Willer, C.J., Li, Y., Duren, W.L., et al.: A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007)
Zeggini, E., Weedon, M.N., Lindgren, C.M., Frayling, T.M., Elliott, K.S., Lango, H., et al.: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007)
Lyon, H.N., Emilsson, V., Hinney, A., Heid, I.M., Lasky-Su, J., Zhu, X., et al.: The association of a SNP upstream of INSIG2 with body mass index is reproduced in several but not all cohorts. PLoS Genet, e61 (2007)
Schmidt, M., Hauser, E.R., Martin, E.R., Schmidt, S.: Extension of the SIMLA package for generating pedigrees with complex inheritance patterns: environmental covariates, gene-gene and gene-environment interaction. Stat Appl Genet Mol Biol, 2005, Article15 (2004)
Bass, M.P., Martin, E.R., Hauser, E.R.: Pedigree generation for analysis of genetic linkage and association. In: Pac Symp Biocomput, pp. 93–103 (1993)
Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet 37, 413–417 (2005)
Boehnke, M.: Estimating the power of a proposed linkage study: a practical computer simulation approach. Am J Hum Genet 39, 513–527 (1986)
Ploughman, L.M., Boehnke, M.: Estimating the power of a proposed linkage study for a complex genetic trait. Am J Hum Genet 44, 543–551 (1989)
Weeks, D.E., Ott, J., Lathrop, G.M.: SLINK: A general simulation paorgram for linkage analysis. American Journal of Human Genetics 47, A204 (1990)
Kingman, J.: The coalescent. Stochastic Processes Appl 13, 235–248 (1982)
Liang, L., Zollner, S., Abecasis, G.R.: GENOME: A rapid coalescent-based whole genome simulator. Bioinformatics 23, 1565–1567 (2007)
Wright, F.A., Huang, H., Guan, X., Gamiel, K., Jeffries, C., Barry, W.T., et al.: Simulating association studies: A data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23, 2581–2588 (2007)
Balloux, F.: EASYPOP (version 1.7): A computer program for population genetics simulations. J Hered 92, 301–302 (2001)
Hey, J.: A computer program for forward population genetic simulation, Ref Type: Computer Program (2005)
Hoggart, C.J., Chadeau, M., Clark, T.G., Lampariello, R., De, I.M., Whittaker, J.C., et al.: Sequence-level population simulations over large genomic regions. Genetics (2007)
Peng, B., Kimmel, M.: simuPOP: A forward-time population genetics simulation environment. Bioinformatics (2005)
Moore, J.H., Hahn, L.W., Ritchie, M.D., Thornton, T.A., White, B.: Routine Discovery of High-Order Epistasis Models for Computational Studies in Human Genetics. Applied Soft Computing 4, 79–86 (2004)
Richards, F.: A flexible growth function for empirical use. Journal of Experimental Botany 10, 290–300 (1959)
Durrant, C., Zondervan, K.T., Cardon, L.R., Hunt, S., Deloukas, P., Morris, A.P.: Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 75, 35–43 (2004)
Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12, 921–927 (1995)
Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet 2005, 449-462 (March, 1976)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Edwards, T.L. et al. (2008). Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA. In: Marchiori, E., Moore, J.H. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2008. Lecture Notes in Computer Science, vol 4973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78757-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-78757-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78756-3
Online ISBN: 978-3-540-78757-0
eBook Packages: Computer ScienceComputer Science (R0)