skip to main content
10.1145/1854776.1854826acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

A graph and hierarchical clustering based approach for population structure inference

Published: 02 August 2010 Publication History

Abstract

The issue of population structure inference arises in many contexts of population genetics. However, handling the genotype data with thousands of loci in the inference process is time-consuming and the prediction accuracy is usually influenced by the noises in genotypes. In this paper, a novel approach is proposed to rapidly infer the population structures on genotype data. TagSNPs are chosen as principle features. A graph-based feature selection method is used to eliminate the invalid loci and reduce the dimensions of genotypes. Then, a hierarchical clustering algorithm combining with the information theory is employed to predict the population structures for each individual. The performance of our approach is evaluated on the simulated data. Compared with the results of STRUCTURE, it indicates that our method can get higher prediction accuracy than STRUCTURE. We also apply our feature selection results to STRUCTURE, and the experimental results show that our feature selection approach can also be used in other inference approach to reduce the running time and improve the prediction accuracy.

References

[1]
Chih, L., Ali, A., Chun-His, H. 2009. PCA-based Population Structure Inference with Generic Clustering Algorithms. BMC Bioinformatics. 10(1):S73.
[2]
Pritchard, J. K., Rosenberg, N. A. 1999. Use of Unlinked Genetic Markers to Detect Population Stratification in Association Studies. Am. J. Hum. Genet. 65: 220--228.
[3]
Pritchard, J. K., Stephens, M., Donnelly, P. 2000. Inference of Population Structure Using Multilocus Genotype Data. Genetics. 155:945--959.
[4]
Falush, D., Stephens, M., Pritchard, J. K. 2003. Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies. Genetics. 164: 1567--1587.
[5]
Falush, D., Stephens, M., Pritchard, J. K. 2007. Inference of Population Structure Using Multilocus Genotype data: Dominant Markers and Null Alleles. Molecular Ecology Notes. 7: 574--578.
[6]
Melissa, J. H., Falush, D., Stephens, M., Pritchard, J. K. 2009. Inferring Weak Population Structure with The Assistance of Sample Group Information. Molecular Ecology Notes. 9(5): 1322--1332.
[7]
Jun, W., Mao-zu, G., Chun-yu, W. 2009. CGTS: a Site-clustering Graph based TagSNP Selection Algorithm in Genotype Data. BMC Bioinformatics. 10(1):S71.
[8]
John, G., Leonard E. T. 1995. K*: An Instance-based Learner Using an Entropic Distance Measure. Proceedings of the 12th International Conference on Machine learning, 108--114.
[9]
Konvalov, D. A., Litow, B., Bajema, N. 2005. Partition-distance via the Assignment Problem. Bioinformatics. 21(10): 2463--2468.
[10]
Liang, L., Zollner, S., Abecasis, G. R. 2007. GENOME: a Rapid Coalescent-based Whole Genome Simulator. Bioinformatics. 23(12):1565--1567.
[11]
Rosenberg, N. A. 2004. Distruct: a Program for the Graphical Ddisplay of Population Structure. Molecular Ecology Notes. 4:137--138.

Index Terms

  1. A graph and hierarchical clustering based approach for population structure inference

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
        August 2010
        705 pages
        ISBN:9781450304382
        DOI:10.1145/1854776
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 02 August 2010

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. clustering
        2. distance-based
        3. feature selection
        4. information theory
        5. population structure inference
        6. tagSNPs

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        BCB'10
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 254 of 885 submissions, 29%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 115
          Total Downloads
        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 17 Jan 2025

        Other Metrics

        Citations

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media