Skip to main content

Population Stratification Analysis in Genome-Wide Association Studies

  • Chapter
  • First Online:
Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Abstract

Differences in genetic background within two or more populations are an important cause of disturbance in case–control association studies. In fact, when mixing together populations of different ethnic groups, different allele frequencies between case and control samples could be due to the ancestry rather than a real association with the disease under study. This can easily lead to a large amount of false positive and negative results in association study analysis. Moreover, the growing need to put together several data sets coming from different studies in order to increase the statistical power of the analysis makes this problem particularly important in recent statistical genetics research. To overcome these problems, different correction strategies have been proposed, but currently there is no consensus about a common powerful strategy to adjust for population stratification. In this chapter, we discuss the state-of-the-art of strategies used for correcting the statistics for genome-wide association analysis by taking into account the ancestral structure of the population. After a short review of the most important methods and tools available, we will show the results obtained in two real data sets and discuss them in terms of advantages and disadvantages of each algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cardon LR, Bell JI: Association study designs for complex diseases. Nat Rev Genet 2(2): 91–99 (2001)

    Article  PubMed  CAS  Google Scholar 

  2. Zondervan KT, Cardon LR: Designing candidate gene and genome-wide case-control association studies. Nat Protoc 2(10): 2492–2501 (2007)

    Article  PubMed  CAS  Google Scholar 

  3. Ziegler A, Konig IR, Thompson JR: Biostatistical aspects of genome-wide association studies. Biom J 50(1): 8–28 (2008)

    Article  PubMed  Google Scholar 

  4. Potkin SG, Turner JA, Guffanti G, Lakatos A, Torri F, Keator DB, Macciardi F: Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations. Cogn Neuropsychiatry 14(4): 391–418 (2009)

    Article  PubMed  Google Scholar 

  5. Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR: Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiol Biomarkers Prev 17(3): 471–477 (2008)

    Article  PubMed  CAS  Google Scholar 

  6. Freedman ML, Reich D, Penney KL et al: Assessing the impact of population stratification on genetic association studies. Nat Genet 36: 388–393 (2004)

    Article  PubMed  CAS  Google Scholar 

  7. Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 361: 598–604 (2003)

    Article  PubMed  Google Scholar 

  8. Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370 (1984)

    Article  Google Scholar 

  9. Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 155: 945–959 (2000)

    PubMed  CAS  Google Scholar 

  10. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science 319(5866): 1100–1104 (2008)

    Article  PubMed  CAS  Google Scholar 

  11. Devlin B, Bacanu B, Roeder K: Genomic control in the extreme. Nat Genet 36: 1129–1130 (2004)

    Article  PubMed  CAS  Google Scholar 

  12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909 (2006)

    Article  PubMed  CAS  Google Scholar 

  13. Purcell S, Neale B, Todd-Brown K et al: PLINK: a toolset for whole-genome association and population-based linkage analysis. AJHG 2007 81: 559–575 (2007)

    Google Scholar 

  14. Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD: Measuring European population stratification with microarray genotype data. Am J Hum Genet 80(5): 948–956 (2007); Epub Mar 22 2007

    Google Scholar 

  15. Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK, Seldin MF: Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet 4(1): e4 (2008)

    Article  PubMed  Google Scholar 

  16. Price AL, Butler J, Patterson N et al: Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 2008 4(1): e236 (2007); Epub Nov 19 2007

    Google Scholar 

  17. Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2: 2074–2093 (2006)

    Article  CAS  Google Scholar 

  18. Novembre J, Stephens M: Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40(5): 646–649 (2008); Epub Apr 20 2008

    Google Scholar 

  19. Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G: Population substructure and control selection in genome-wide association studies. PLoS ONE 3(7): e2551 (2008)

    Google Scholar 

  20. Wellcome Trust Case Control Consortium.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145): 661–678 (2007)

    Google Scholar 

  21. Yeager M, Orr N, Hayes RB et al: Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39(5): 645–649 (2007); Epub Apr 1 2007

    Google Scholar 

  22. Hunter DJ, Kraft P, Jacobs KB et al: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7): 870–874 (2007); Epub May 27 2007

    Google Scholar 

  23. Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 80(5): 921–930 (2007); Epub Mar 29 2007

    Google Scholar 

  24. Serre D, Montpetit A, Par G, Engert JC, Yusuf S, Keavney B, Hudson TJ, Anand S: Correction of population stratification in large multi-ethnic association studies. PLoS ONE 3(1): e1382 (2008)

    Google Scholar 

  25. Li Q, Yu K: Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epidemiol 32(3): 215–226 (2008)

    Article  PubMed  CAS  Google Scholar 

  26. Seldin MF, Price AL: Application of ancestry informative markers to association studies in European Americans. PLoS Genet 4(1): e5 (2008)

    Article  PubMed  Google Scholar 

  27. Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ: A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet 80(6): 1171–1178 (2007); Epub Apr 20 2007

    Google Scholar 

  28. Sullivan PF, Lin D, Tzeng JY et al: Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol Psychiatry 13: 570–584 (2008)

    Article  PubMed  CAS  Google Scholar 

  29. Steemers FJ, Gunderson KL: Pharmacogenomics 6: 777–778 (2005)

    Google Scholar 

  30. Fan J-B, Chee MS, Gunderson KL: Highly parallel genomic assays. Nature Publishing Group 7: 632–644 (2006)

    CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by FIRB Italia-Israele [RBIN04SWHR], a fellowship of the Doctorate School of Molecular Medicine, University of Milan, POCEMON [FP7-ICT-2007-216088], HYPERGENES [HEALTH-F4-2007-201550], InGenious HyperCare [LSHM-CT-2006-037093], by the Israel Science Foundation [Israel Academy of Sciences, Grant #348/09], by the Enabling Grids for E-sciencE (INFSO-RI-222667), CNR-BIOINFORMATICS, ITALBIONET, Italian-Canada FIRB-MUR Projects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luciano Milanesi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer New York

About this chapter

Cite this chapter

Salvi, E. et al. (2011). Population Stratification Analysis in Genome-Wide Association Studies. In: Bruni, R. (eds) Mathematical Approaches to Polymer Sequence Analysis and Related Problems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6800-5_9

Download citation

Publish with us

Policies and ethics