Measuring Cluster Stability in a Large Scale Phylogenetic Analysis of Functional Genes in Metagenomes Using pplacer | IEEE Journals & Magazine | IEEE Xplore

Measuring Cluster Stability in a Large Scale Phylogenetic Analysis of Functional Genes in Metagenomes Using pplacer


Abstract:

Analysis of metagenomic sequence data requires a multi-stage workflow. The results of each intermediate step possess an inherent uncertainty and potentially impact the as...Show More

Abstract:

Analysis of metagenomic sequence data requires a multi-stage workflow. The results of each intermediate step possess an inherent uncertainty and potentially impact the as-yet-unmeasured statistical significance of downstream analyses. Here, we describe our phylogenetic analysis pipeline which uses the pplacer program to place many shotgun sequences corresponding to a single functional gene onto a fixed phylogenetic tree. We then use the squash clustering method to compare multiple samples with respect to that gene. We approximate the statistical significance of each gene's clustering result by measuring its cluster stability, the consistency of that clustering result when the probabilistic placements made by pplacer are systematically reassigned and then clustered again, as measured by the adjusted Rand Index. We find that among the genes investigated, the majority of analyses are stable, based on the average adjusted Rand Index. We investigated properties of each gene that may explain less stable results. These genes tended to have less convex reference trees, less total reads recruited to the gene, and a greater Expected Distance between Placement Locations as given by pplacer when examined in aggregate. However, for an individual functional gene, these measures alone do not predict cluster stability.
Published in: IEEE/ACM Transactions on Computational Biology and Bioinformatics ( Volume: 13, Issue: 2, 01 March-April 2016)
Page(s): 341 - 349
Date of Publication: 17 June 2015

ISSN Information:

PubMed ID: 27045832

References

References is not available for this document.