Skip to main content

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2))

Included in the following conference series:

  • 966 Accesses

Abstract

In human genetics, large-scale data are now available with advances in genotyping technologies and international collaborative projects. Our ongoing study of obesity involves Affymetrix 500k genechips on approximately 7000 individuals from the European Prospective Investigation of Cancer (EPIC) Norfolk study. Although the scale of our data is well beyond the ability of many software systems, we have successfully performed the analysis using the statistical analysis system (SAS) software. Our implementation trades memory with computing time and requires moderate hardware configuration. By using such an established system, it extends some earlier discussions in a more constructive and accessible way. We report our findings and give some recommendations with SAS. We also compare briefly with alternative implementations. Our work is relevant to researchers conducting analysis of large-scale data in general, and genomewide association studies in particular.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grant, S.F., Thorleifsson, G., Reynisdottir, I., Benediktsson, R., Manolescu, A., Sainz, J., Helgason, A., Stefansson, H., Emilsson, V., Helgadottir, A., et al.: Variant of Transcription Factor 7-Like 2 (TCF7L2) Gene Confers Risk of Type 2 Diabetes. Nat Genet 38, 320–323 (2006)

    Article  Google Scholar 

  2. Herbert, A., Gerry, N.P., McQueen, M.B., Heid, I.M., Pfeufer, A., Illig, T., Wichmann, H.E., Meitinger, T., Hunter, D., Hu, F.B., et al.: A Common Genetic Variant is Associated with Adult and Childhood Obesity. Science 312, 279–283 (2006)

    Article  Google Scholar 

  3. Thomas, D.C., Haile, R.W., Duggan, D.: Recent Developments in Genomewide Association Scans: a Workshop Summary and Review. Am J. Hum Genet 77, 337–345 (2005)

    Article  Google Scholar 

  4. Guo, S.W., Lange, K.: Genetic Mapping of Complex Traits: Promises, Problems, and Prospects. Theor Popul Biol. 57, 1–11 (2000)

    Article  Google Scholar 

  5. Excoffier, L., Heckel, G.: Computer Programs for Population Genetics Data Analysis: A Survival Guide. Nat. Rev. Genet. 7, 745–758 (2006)

    Article  Google Scholar 

  6. Dudbridge, F.: A Survey of Current Software for Linkage Analysis. Hum Genomics 1, 63–65 (2003)

    Google Scholar 

  7. Weale, M.E.: A Survey of Current Software for Haplotype Phase Inference. Hum Genomics 1, 141–144 (2004)

    Google Scholar 

  8. Salem, R.M., Wessel, J., Schork, N.J.: A Comprehensive Literature Review of Haplotyping Software and Methods for Use with Unrelated Individuals. Hum Genomics 2, 39–66 (2005)

    Google Scholar 

  9. Zhao, J.H., Tan, Q.: Integrated Analysis of Genetic Data with R. Hum Genomics 2, 258–265 (2006)

    Google Scholar 

  10. Zhao, J.H., Tan, Q.: Genetic Dissection of Complex Traits in Silico: Approaches, Problems and Solutions. Curr Bioinformatics 1, 359–369 (2006)

    Article  Google Scholar 

  11. Frayling, T.M., Timpson, N.J., Weedon, M.N., Zeggini, E., Freathy, R.M., Lindgren, C.M., Prry, J.R.B., Elliott, K.S., Lango, H., Rayner, N.W., et al.: A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity. Science online  (2007)

    Google Scholar 

  12. Clayton, D., Leung, H.-T.: An R Package for Analysis of Whole-Genome Association Studies. Hum Hered 64, 45–51 (2007)

    Article  Google Scholar 

  13. Zhao, J.H., Sham, P.C.: Faster Haplotype Frequency Estimation Using Unrelated Subjects. Hum Hered 53, 36–41 (2002)

    Article  Google Scholar 

  14. Olson, J.M., Witte, J.S., Elston, R.C.: Genetic Mapping of Complex Traits. Stat Med 18, 2961–2981 (1999)

    Article  Google Scholar 

  15. Elston, R.C., Anne Spence, M.: Advances in Statistical Human Genetics Over the Last 25 Years. Stat Med 25, 3049–3080 (2006)

    Article  MathSciNet  Google Scholar 

  16. Balding, D.J.: A Tutorial on Statistical Methods for Population Association Studies. Nat Rev Genet 7, 781–791 (2006)

    Article  Google Scholar 

  17. Lander, E.S., Schork, N.J.: Genetic Dissection of Complex Traits. Science 265, 2037–2048 (1994)

    Article  Google Scholar 

  18. Risch, N., Merikangas, K.: The Future of Genetic Studies of Complex Human Diseases. Science 273, 1516–1517 (1996)

    Article  Google Scholar 

  19. Long, A.D., Grote, M.N., Langley, C.H.: Genetic Analysis of Complex Diseases. Science 275, 1328–1330 (1997)

    Google Scholar 

  20. Kruglyak, L.: Prospects for Whole-Genome Linkage Disequilibrium Mapping of Common Disease Genes. Nat Genet 22, 139–144 (1999)

    Article  Google Scholar 

  21. Breslow, N.E.: Statistics in Epidemiology: the Case-control Study. J. Am Stat Assoc. 91, 14–28 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  22. Carlson, C.S., Eberle, M.A., Kruglyak, L., Nickerson, D.A.: Mapping Complex Disease Loci in Whole-Genome Association Studies. Nature 429, 446–452 (2004)

    Article  Google Scholar 

  23. Hirschhorn, J.N., Daly, M.J.: Genome-Wide Association Studies for Common Diseases and Complex Traits. Nat. Rev. Genet. 6, 95–108 (2005)

    Article  Google Scholar 

  24. Wang, W.Y., Barratt, B.J., Clayton, D.G., Todd, J.A.: Genome-Wide Association Studies: Theoretical and Practical Concerns. Nat. Rev. Genet. 6, 109–118 (2005)

    Article  Google Scholar 

  25. Klein, R.J., Zeiss, C., Chew, E.Y., Tsai, J.Y., Sackler, R.S., Haynes, C., Henning, A.K., SanGiovanni, J.P., Mane, S.M., Mayne, S.T., et al.: Complement Factor H Polymorphism in Age-Related Macular Degeneration. Science 308, 385–389 (2005)

    Article  Google Scholar 

  26. Elston, R.C., Guo, X., Williams, L.V.: Two-Stage Global Search Designs for Linkage Analysis Using Pairs of Affected Relatives. Genet Epidemiol 13, 535–558 (1996)

    Article  Google Scholar 

  27. Holmans, P., Craddock, N.: Efficient Strategies for Genome Scanning Using Maximum-Likelihood Affected Sib-Pair Analysis. Am. J. Hum. Genet. 60, 657–666 (1997)

    Google Scholar 

  28. Sham, P.C., Zhao, J.H.: The Power of Genome-Wide Sib Pair Linkage Scans for Quantitative Trait Loci Using the New Haseman-Elston Regression Method. Gene Screen 1, 103–106 (2000)

    Google Scholar 

  29. Guo, X., Elston, R.C.: One-Stage Versus Two-Stage Strategies for Genome Scans. Adv. Genet. 42, 459–471 (2001)

    Google Scholar 

  30. Satagopan, J.M., Verbel, D.A., Venkatraman, E.S., Offit, K.E., Begg, C.B.: Two-Stage Designs for Gene-Disease Association Studies. Biometrics 58, 163–170 (2002)

    Article  MathSciNet  Google Scholar 

  31. Satagopan, J.M., Elston, R.C.: Optimal Two-Stage Genotyping in Population-Based Association Studies. Genet Epidemiol 25, 149–157 (2003)

    Article  Google Scholar 

  32. Satagopan, J.M., Venkatraman, E.S., Begg, C.B.: Two-Stage Designs for Gene-Disease Association Studies with Sample Size Constraints. Biometrics 60, 589–597 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  33. Thomas, D., Xie, R., Gebregziabher, M.: Two-Stage Sampling Designs for Gene Association Studies. Genet. Epidemiol. 27, 401–414 (2004)

    Article  Google Scholar 

  34. Skol, A.D., Scott, L.J., Abecasis, G.R., Boehnke, M.: Joint Analysis Is More Efficient Than Replication-Based Analysis for Two-Stage Genome-Wide Association Studies. Nat. Genet. 38, 209–213 (2006)

    Article  Google Scholar 

  35. Lin, D.Y.: Evaluating Statistical Significance in Two-Stage Genomewide Association Studies. Am. J. Hum. Genet. 78, 505–509 (2006)

    Article  Google Scholar 

  36. Wang, H., Thomas, D.C., Pe’er, I., Stram, D.O.: Optimal Two-Stage Genotyping Designs for Genome-Wide Association Scans. Genet. Epidemiol. 30, 356–368 (2006)

    Article  Google Scholar 

  37. Clerget-Darpoux, F., Bonaiti-Pellie, C., Hochez, J.: Effects of Misspecifying Genetic Parameters in LOD Score Analysis. Biometrics 42, 393–399 (1986)

    Article  Google Scholar 

  38. Curtis, D., Sham, P.C.: Model-Free Linkage Analysis Using Likelihoods. Am. J. Hum. Genet. 57, 703–716 (1995)

    Google Scholar 

  39. Zhao, J.H., Curtis, D., Sham, P.C.: Model-Free Analysis and Permutation Tests for Allelic Associations. Hum Hered 50, 133–139 (2000)

    Article  Google Scholar 

  40. Hodge, S.E., Abreu, P.C., Greenberg, D.A.: Magnitude of Type I Error When Single-Locus Linkage Analysis Is Maximized Over Models: A Simulation Study. Am. J. Hum. Genet. 60, 217–227 (1997)

    Google Scholar 

  41. Nielsen, D.M., Ehm, M.G., Weir, B.S.: Detecting Marker-Disease Association by Testing for Hardy-Weinberg Disequilibrium at a Marker Locus. Am. J. Hum. Genet. 63, 1531–1540 (1998)

    Article  Google Scholar 

  42. Zou, G.Y., Donner, A.: The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched case-control data: a cautionary note. Ann Hum Genet 70, 923–933 (2006)

    Article  Google Scholar 

  43. Xu, J., Turner, A., Little, J., Bleecker, E.R., Meyers, D.A.: Positive Results in Association Studies Are Associated with Departure from Hardy-Weinberg Equilibrium: Hint for Genotyping Error? Hum Genet 111, 573–574 (2002)

    Article  Google Scholar 

  44. Kraft, P., Yen, Y.C., Stram, D.O., Morrison, J., Gauderman, W.J.: Exploiting Gene-Environment Interaction to Detect Genetic Associations. Hum Hered 63, 111–119 (2007)

    Article  Google Scholar 

  45. Langholz, B., Rothman, N., Wacholder, S., Thomas, D.C.: Cohort Studies for Characterizing Measured Genes. J. Natl Cancer Inst Monogr 26, 39–42 (1999)

    Google Scholar 

  46. Manolio, T.A., Bailey-Wilson, J.E., Collins, F.S.: Genes, Environment and the Value of Prospective Cohort Studies. Nat. Rev. Genet. 7, 812–820 (2006)

    Article  Google Scholar 

  47. Cai, J., Zeng, D.: Sample Size/Power Calculation for Case-Cohort Studies. Biometrics 60, 1015–1024 (2004)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

De-Shuang Huang Laurent Heutte Marco Loog

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, J.H., Luan, J., Tan, Q., Loos, R., Wareham, N. (2007). Analysis of Large Genomic Data in Silico: The EPIC-Norfolk Study of Obesity. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques. ICIC 2007. Communications in Computer and Information Science, vol 2. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74282-1_87

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74282-1_87

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74281-4

  • Online ISBN: 978-3-540-74282-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics