Abstract
Differences in genetic background within two or more populations are an important cause of disturbance in case–control association studies. In fact, when mixing together populations of different ethnic groups, different allele frequencies between case and control samples could be due to the ancestry rather than a real association with the disease under study. This can easily lead to a large amount of false positive and negative results in association study analysis. Moreover, the growing need to put together several data sets coming from different studies in order to increase the statistical power of the analysis makes this problem particularly important in recent statistical genetics research. To overcome these problems, different correction strategies have been proposed, but currently there is no consensus about a common powerful strategy to adjust for population stratification. In this chapter, we discuss the state-of-the-art of strategies used for correcting the statistics for genome-wide association analysis by taking into account the ancestral structure of the population. After a short review of the most important methods and tools available, we will show the results obtained in two real data sets and discuss them in terms of advantages and disadvantages of each algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cardon LR, Bell JI: Association study designs for complex diseases. Nat Rev Genet 2(2): 91–99 (2001)
Zondervan KT, Cardon LR: Designing candidate gene and genome-wide case-control association studies. Nat Protoc 2(10): 2492–2501 (2007)
Ziegler A, Konig IR, Thompson JR: Biostatistical aspects of genome-wide association studies. Biom J 50(1): 8–28 (2008)
Potkin SG, Turner JA, Guffanti G, Lakatos A, Torri F, Keator DB, Macciardi F: Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations. Cogn Neuropsychiatry 14(4): 391–418 (2009)
Barnholtz-Sloan JS, McEvoy B, Shriver MD, Rebbeck TR: Ancestry estimation and correction for population stratification in molecular epidemiologic association studies. Cancer Epidemiol Biomarkers Prev 17(3): 471–477 (2008)
Freedman ML, Reich D, Penney KL et al: Assessing the impact of population stratification on genetic association studies. Nat Genet 36: 388–393 (2004)
Cardon LR, Palmer LJ: Population stratification and spurious allelic association. Lancet 361: 598–604 (2003)
Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370 (1984)
Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics 155: 945–959 (2000)
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science 319(5866): 1100–1104 (2008)
Devlin B, Bacanu B, Roeder K: Genomic control in the extreme. Nat Genet 36: 1129–1130 (2004)
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904–909 (2006)
Purcell S, Neale B, Todd-Brown K et al: PLINK: a toolset for whole-genome association and population-based linkage analysis. AJHG 2007 81: 559–575 (2007)
Bauchet M, McEvoy B, Pearson LN, Quillen EE, Sarkisian T, Hovhannesyan K, Deka R, Bradley DG, Shriver MD: Measuring European population stratification with microarray genotype data. Am J Hum Genet 80(5): 948–956 (2007); Epub Mar 22 2007
Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK, Seldin MF: Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet 4(1): e4 (2008)
Price AL, Butler J, Patterson N et al: Discerning the ancestry of European Americans in genetic association studies. PLoS Genet 2008 4(1): e236 (2007); Epub Nov 19 2007
Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet 2: 2074–2093 (2006)
Novembre J, Stephens M: Interpreting principal component analyses of spatial population genetic variation. Nat Genet 40(5): 646–649 (2008); Epub Apr 20 2008
Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G: Population substructure and control selection in genome-wide association studies. PLoS ONE 3(7): e2551 (2008)
Wellcome Trust Case Control Consortium.: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145): 661–678 (2007)
Yeager M, Orr N, Hayes RB et al: Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39(5): 645–649 (2007); Epub Apr 1 2007
Hunter DJ, Kraft P, Jacobs KB et al: A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7): 870–874 (2007); Epub May 27 2007
Epstein MP, Allen AS, Satten GA: A simple and improved correction for population stratification in case-control studies. Am J Hum Genet 80(5): 921–930 (2007); Epub Mar 29 2007
Serre D, Montpetit A, Par G, Engert JC, Yusuf S, Keavney B, Hudson TJ, Anand S: Correction of population stratification in large multi-ethnic association studies. PLoS ONE 3(1): e1382 (2008)
Li Q, Yu K: Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epidemiol 32(3): 215–226 (2008)
Seldin MF, Price AL: Application of ancestry informative markers to association studies in European Americans. PLoS Genet 4(1): e5 (2008)
Mao X, Bigham AW, Mei R, Gutierrez G, Weiss KM, Brutsaert TD, Leon-Velarde F, Moore LG, Vargas E, McKeigue PM, Shriver MD, Parra EJ: A genomewide admixture mapping panel for Hispanic/Latino populations. Am J Hum Genet 80(6): 1171–1178 (2007); Epub Apr 20 2007
Sullivan PF, Lin D, Tzeng JY et al: Genomewide association for schizophrenia in the CATIE study: results of stage 1. Mol Psychiatry 13: 570–584 (2008)
Steemers FJ, Gunderson KL: Pharmacogenomics 6: 777–778 (2005)
Fan J-B, Chee MS, Gunderson KL: Highly parallel genomic assays. Nature Publishing Group 7: 632–644 (2006)
Acknowledgements
This work was supported by FIRB Italia-Israele [RBIN04SWHR], a fellowship of the Doctorate School of Molecular Medicine, University of Milan, POCEMON [FP7-ICT-2007-216088], HYPERGENES [HEALTH-F4-2007-201550], InGenious HyperCare [LSHM-CT-2006-037093], by the Israel Science Foundation [Israel Academy of Sciences, Grant #348/09], by the Enabling Grids for E-sciencE (INFSO-RI-222667), CNR-BIOINFORMATICS, ITALBIONET, Italian-Canada FIRB-MUR Projects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer New York
About this chapter
Cite this chapter
Salvi, E. et al. (2011). Population Stratification Analysis in Genome-Wide Association Studies. In: Bruni, R. (eds) Mathematical Approaches to Polymer Sequence Analysis and Related Problems. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6800-5_9
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6800-5_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6799-2
Online ISBN: 978-1-4419-6800-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)