Abstract
In human genetics, large-scale data are now available with advances in genotyping technologies and international collaborative projects. Our ongoing study of obesity involves Affymetrix 500k genechips on approximately 7000 individuals from the European Prospective Investigation of Cancer (EPIC) Norfolk study. Although the scale of our data is well beyond the ability of many software systems, we have successfully performed the analysis using the statistical analysis system (SAS) software. Our implementation trades memory with computing time and requires moderate hardware configuration. By using such an established system, it extends some earlier discussions in a more constructive and accessible way. We report our findings and give some recommendations with SAS. We also compare briefly with alternative implementations. Our work is relevant to researchers conducting analysis of large-scale data in general, and genomewide association studies in particular.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Grant, S.F., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., Helgason, A., Stefansson, H., Emilsson, V., Helgadottir, A., et al.: Variant of Transcription Factor 7-Like 2 (TCF7L2) Gene Confers Risk of Type 2 Diabetes. Nat Genet 38, 320–323 (2006)
Herbert, A., Gerry, N.P., McQueen, M.B., Heid, I.M., Pfeufer, A., Illig, T., Wichmann, H.E., Meitinger, T., Hunter, D., Hu, F.B., et al.: A Common Genetic Variant is Associated with Adult and Childhood Obesity. Science 312, 279–283 (2006)
Thomas, D.C., Haile, R.W., Duggan, D.: Recent Developments in Genomewide Association Scans: a Workshop Summary and Review. Am J. Hum Genet 77, 337–345 (2005)
Guo, S.W., Lange, K.: Genetic Mapping of Complex Traits: Promises, Problems, and Prospects. Theor Popul Biol. 57, 1–11 (2000)
Excoffier, L., Heckel, G.: Computer Programs for Population Genetics Data Analysis: A Survival Guide. Nat. Rev. Genet. 7, 745–758 (2006)
Dudbridge, F.: A Survey of Current Software for Linkage Analysis. Hum Genomics 1, 63–65 (2003)
Weale, M.E.: A Survey of Current Software for Haplotype Phase Inference. Hum Genomics 1, 141–144 (2004)
Salem, R.M., Wessel, J., Schork, N.J.: A Comprehensive Literature Review of Haplotyping Software and Methods for Use with Unrelated Individuals. Hum Genomics 2, 39–66 (2005)
Zhao, J.H., Tan, Q.: Integrated Analysis of Genetic Data with R. Hum Genomics 2, 258–265 (2006)
Zhao, J.H., Tan, Q.: Genetic Dissection of Complex Traits in Silico: Approaches, Problems and Solutions. Curr Bioinformatics 1, 359–369 (2006)
Frayling, T.M., Timpson, N.J., Weedon, M.N., Zeggini, E., Freathy, R.M., Lindgren, C.M., Prry, J.R.B., Elliott, K.S., Lango, H., Rayner, N.W., et al.: A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity. Science online (2007)
Clayton, D., Leung, H.-T.: An R Package for Analysis of Whole-Genome Association Studies. Hum Hered 64, 45–51 (2007)
Zhao, J.H., Sham, P.C.: Faster Haplotype Frequency Estimation Using Unrelated Subjects. Hum Hered 53, 36–41 (2002)
Olson, J.M., Witte, J.S., Elston, R.C.: Genetic Mapping of Complex Traits. Stat Med 18, 2961–2981 (1999)
Elston, R.C., Anne Spence, M.: Advances in Statistical Human Genetics Over the Last 25 Years. Stat Med 25, 3049–3080 (2006)
Balding, D.J.: A Tutorial on Statistical Methods for Population Association Studies. Nat Rev Genet 7, 781–791 (2006)
Lander, E.S., Schork, N.J.: Genetic Dissection of Complex Traits. Science 265, 2037–2048 (1994)
Risch, N., Merikangas, K.: The Future of Genetic Studies of Complex Human Diseases. Science 273, 1516–1517 (1996)
Long, A.D., Grote, M.N., Langley, C.H.: Genetic Analysis of Complex Diseases. Science 275, 1328–1330 (1997)
Kruglyak, L.: Prospects for Whole-Genome Linkage Disequilibrium Mapping of Common Disease Genes. Nat Genet 22, 139–144 (1999)
Breslow, N.E.: Statistics in Epidemiology: the Case-control Study. J. Am Stat Assoc. 91, 14–28 (1996)
Carlson, C.S., Eberle, M.A., Kruglyak, L., Nickerson, D.A.: Mapping Complex Disease Loci in Whole-Genome Association Studies. Nature 429, 446–452 (2004)
Hirschhorn, J.N., Daly, M.J.: Genome-Wide Association Studies for Common Diseases and Complex Traits. Nat. Rev. Genet. 6, 95–108 (2005)
Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-Wide Association Studies: Theoretical and Practical Concerns. Nat. Rev. Genet. 6, 109–118 (2005)
Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., et al.: Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science 308, 385–389 (2005)
Elston, R.C., Guo, X., Williams, L.V.: Two-Stage Global Search Designs for Linkage Analysis Using Pairs of Affected Relatives. Genet Epidemiol 13, 535–558 (1996)
Holmans, P., Craddock, N.: Efficient Strategies for Genome Scanning Using Maximum-Likelihood Affected Sib-Pair Analysis. Am. J. Hum. Genet. 60, 657–666 (1997)
Sham, P.C., Zhao, J.H.: The Power of Genome-Wide Sib Pair Linkage Scans for Quantitative Trait Loci Using the New Haseman-Elston Regression Method. Gene Screen 1, 103–106 (2000)
Guo, X., Elston, R.C.: One-Stage Versus Two-Stage Strategies for Genome Scans. Adv. Genet. 42, 459–471 (2001)
Satagopan, J.M., Verbel, D.A., Venkatraman, E.S., Offit, K.E., Begg, C.B.: Two-Stage Designs for Gene-Disease Association Studies. Biometrics 58, 163–170 (2002)
Satagopan, J.M., Elston, R.C.: Optimal Two-Stage Genotyping in Population-Based Association Studies. Genet Epidemiol 25, 149–157 (2003)
Satagopan, J.M., Venkatraman, E.S., Begg, C.B.: Two-Stage Designs for Gene-Disease Association Studies with Sample Size Constraints. Biometrics 60, 589–597 (2004)
Thomas, D., Xie, R., Gebregziabher, M.: Two-Stage Sampling Designs for Gene Association Studies. Genet. Epidemiol. 27, 401–414 (2004)
Skol, A.D., Scott, L.J., Abecasis, G.R., Boehnke, M.: Joint Analysis Is More Efficient Than Replication-Based Analysis for Two-Stage Genome-Wide Association Studies. Nat. Genet. 38, 209–213 (2006)
Lin, D.Y.: Evaluating Statistical Significance in Two-Stage Genomewide Association Studies. Am. J. Hum. Genet. 78, 505–509 (2006)
Wang, H., Thomas, D.C., Pe’er, I., Stram, D.O.: Optimal Two-Stage Genotyping Designs for Genome-Wide Association Scans. Genet. Epidemiol. 30, 356–368 (2006)
Clerget-Darpoux, F., Bonaiti-Pellie, C., Hochez, J.: Effects of Misspecifying Genetic Parameters in LOD Score Analysis. Biometrics 42, 393–399 (1986)
Curtis, D., Sham, P.C.: Model-Free Linkage Analysis Using Likelihoods. Am. J. Hum. Genet. 57, 703–716 (1995)
Zhao, J.H., Curtis, D., Sham, P.C.: Model-Free Analysis and Permutation Tests for Allelic Associations. Hum Hered 50, 133–139 (2000)
Hodge, S.E., Abreu, P.C., Greenberg, D.A.: Magnitude of Type I Error When Single-Locus Linkage Analysis Is Maximized Over Models: A Simulation Study. Am. J. Hum. Genet. 60, 217–227 (1997)
Nielsen, D.M., Ehm, M.G., Weir, B.S.: Detecting Marker-Disease Association by Testing for Hardy-Weinberg Disequilibrium at a Marker Locus. Am. J. Hum. Genet. 63, 1531–1540 (1998)
Zou, G.Y., Donner, A.: The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched case-control data: a cautionary note. Ann Hum Genet 70, 923–933 (2006)
Xu, J., Turner, A., Little, J., Bleecker, E.R., Meyers, D.A.: Positive Results in Association Studies Are Associated with Departure from Hardy-Weinberg Equilibrium: Hint for Genotyping Error? Hum Genet 111, 573–574 (2002)
Kraft, P., Yen, Y.C., Stram, D.O., Morrison, J., Gauderman, W.J.: Exploiting Gene-Environment Interaction to Detect Genetic Associations. Hum Hered 63, 111–119 (2007)
Langholz, B., Rothman, N., Wacholder, S., Thomas, D.C.: Cohort Studies for Characterizing Measured Genes. J. Natl Cancer Inst Monogr 26, 39–42 (1999)
Manolio, T.A., Bailey-Wilson, J.E., Collins, F.S.: Genes, Environment and the Value of Prospective Cohort Studies. Nat. Rev. Genet. 7, 812–820 (2006)
Cai, J., Zeng, D.: Sample Size/Power Calculation for Case-Cohort Studies. Biometrics 60, 1015–1024 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhao, J.H., Luan, J., Tan, Q., Loos, R., Wareham, N. (2007). Analysis of Large Genomic Data in Silico: The EPIC-Norfolk Study of Obesity. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques. ICIC 2007. Communications in Computer and Information Science, vol 2. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74282-1_87
Download citation
DOI: https://doi.org/10.1007/978-3-540-74282-1_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74281-4
Online ISBN: 978-3-540-74282-1
eBook Packages: Computer ScienceComputer Science (R0)