Skip to main content

On Assigning Individuals from Cryptic Population Structures to Optimal Predicted Subpopulations: An Empirical Evaluation of Non-parametric Population Structure Analysis Techniques

  • Conference paper
Computational Systems-Biology and Bioinformatics (CSBio 2010)

Abstract

Many algorithms have been proposed to analyze population structures from the single nucleotide polymorphism (SNP) genotyping data of some number of individuals and try to assign individuals to genetically similar groups. These algorithms can be categorized into two computational paradigms: parametric and non-parametric approaches. Although the parametric-based approach is a gold standard for population structure analysis, the computational burden incurred by running these algorithms is unacceptable for large complex dataset. As genotyping platforms incorporating more SNPs, analyzing ever larger and more complex datasets are becoming a standard practice. Hence, the computationally efficient non-parametric methods for analysis of genotypic datasets are needed to reveal the population structure. In this study, we evaluated two leading non-parametric population structure analysis techniques, namely ipPCA and AWclust, on their abilities to characterize the genetic diversity and population structure of two complex SNP genotype datasets (as many as 243855 SNPs). The head-to-head comparisons were conducted on two major aspects: ability to infer the number of genetically related subpopulations (K) and ability to correctly assign individuals to these subpopulations. The experimental results suggested that AWclust could be more suitable when applying to a small and less complex dataset. However, with a large and more complex dataset, ipPCA is a much better choice yielding higher accuracy on assigning genetically similar individuals to the inferred groups.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lander, E.S., Schork, N.J.: Genetic Dissection of Complex Traits. Science 265(5181), 2037–2048 (1994)

    Article  CAS  PubMed  Google Scholar 

  2. Risch, N.J.: Searching for Genetic Determinants in the New Millennium. Nature 405, 847–856 (2000)

    Article  CAS  PubMed  Google Scholar 

  3. Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The Effects of Human Population Structure on Large Genetic Association Studies. Nat. Genet. 36(5), 512–517 (2004)

    Article  CAS  PubMed  Google Scholar 

  4. Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., Patterson, N., Gabriel, S.B., Topol, E.J., Smoller, J.W., Pato, C.N., Pato, M.T., Petryshen, T.L., Kolonel, L.N., Lander, E.S., Sklar, P., Henderson, B., Hirschhorn, J.N., Altshuler, D.: Assessing the Impact of Population Stratification on Genetic Association Studies. Nat. Genet. 36, 388–393 (2004)

    Article  CAS  PubMed  Google Scholar 

  5. Cavalli-Sforza, L.L., Menozzi, P., Piazza, A.: The History and Geography of Human Genes. Princeton University Press, Princeton (1994)

    Google Scholar 

  6. Bowcock, A.M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J., Cavalli-Sforza, L.L.: High Resolution of Human Evolutionary Trees with Polymorphic Microsatellites. Nature 368, 455–457 (1994)

    Article  CAS  PubMed  Google Scholar 

  7. Mountain, J.L., Cavalli-Sforza, L.L.: Multilocus Genotypes, a Tree of Individuals, and Human Evolutionary History. Am. J. Hum. Genet. 61, 705–718 (1997)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., Feldman, M.W.: Genetic Structure of Human Populations. Science 298, 2381–2384 (2002)

    Article  CAS  PubMed  Google Scholar 

  9. Shriver, M.D., Kennedy, G.C., Parra, E.J., Lawson, H.A., Sonpar, V., Huang, J., Akey, J.M., Jones, K.W.: The Genomic Distribution of Population Substructure in Four Populations Using 8,525 Autosomal SNPs. Hum. Genomics 1, 274–276 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pritchard, J.K., Stephens, M., Donelly, P.: Inference of Population Structure Using Multilocus Genotype Data. Am. J. Hum. Genet. 67, 945–959 (2000)

    Article  Google Scholar 

  11. Purcell, S., Sham, P.: Properties of Structured Association Approaches to Detecting Population Stratification. Hum. Hered. 58, 93–107 (2004)

    Article  PubMed  Google Scholar 

  12. Intarapanich, A., Shaw, P.J., Assawamakin, A., Wangkumhang, P., Ngamphiw, C., Chaichoompu, K., Piriyapongsa, J., Tongsima, S.: Iterative Pruning PCA Improves Resolution of Highly Structured Populations. BMC Bioinf. 10(382) (2009)

    Google Scholar 

  13. Gao, X., Starmer, J.D.: AWclust: Point-and-Click Software for Non-parametric Population Structure Analysis. BMC Bioinf. 9(77) (2008)

    Google Scholar 

  14. Xing, J., Watkins, W.S., Witherspoon, D.J., Zhang, Y., Guthery, S.L., Thara, R., Mowry, B.J., Bulayeva, K., Weiss, R.B., Jorde, L.B.: Fine-Scaled Human Genetic Structure Revealed by SNP Microarrays. Genome Res. 19, 815–825 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Liang, L., Zollner, S., Abecasis, G.R.: GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics (Oxford, England) 23(12), 1565–1567 (2007)

    Article  CAS  Google Scholar 

  16. Ewens, W.J.: Mathematical Population Genetics. Springer, Berlin (1979)

    Google Scholar 

  17. Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  Google Scholar 

  18. Parsons, L., Haque, E., Liu, H.: Subspace Clustering for High Dimensional Data: a Review. ACM SIGKDD Explor. Newslett. 6(1), 15 (2004)

    Article  Google Scholar 

  19. Patterson, N., Price, A.L., Reich, D.: Population Structure and Eigenanalysis. PLoS genet. 2(12), e190 (2006)

    Article  Google Scholar 

  20. Gibbs, R.A., Tassell, C.V., Weinstock, G., Green, R., Hamernik, D., Kappes, S., Liu, G., Matukumalli, L., Matukumali, A., Sonstegard, T., Silva, M.: Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds. Science 24, 528–532 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Deejai, P., Assawamakin, A., Wangkumhang, P., Poomputsa, K., Tongsima, S. (2010). On Assigning Individuals from Cryptic Population Structures to Optimal Predicted Subpopulations: An Empirical Evaluation of Non-parametric Population Structure Analysis Techniques. In: Chan, J.H., Ong, YS., Cho, SB. (eds) Computational Systems-Biology and Bioinformatics. CSBio 2010. Communications in Computer and Information Science, vol 115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16750-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16750-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16749-2

  • Online ISBN: 978-3-642-16750-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics