Skip to main content

Fast Phylogenetic Biodiversity Computations Under a Non-uniform Random Distribution

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9649))

  • 2014 Accesses

Abstract

Computing the phylogenetic diversity of a set of species is an important part of many ecological case studies. More specifically, let \(\mathcal {T}\) be a phylogenetic tree, and let R be a subset of its leaves representing the species under study. Specialists in ecology want to evaluate a function \(f(\mathcal {T},R)\) (a phylogenetic measure) that quantifies the evolutionary distance between the elements in R. But, in most applications, it is also important to examine how \(f(\mathcal {T},R)\) behaves when R is selected at random. The standard way to do this is to compute the mean and the variance of f among all subsets of leaves in \(\mathcal {T}\) that consist of exactly \(|R| = r\) elements. For certain measures, there exist algorithms that can compute these statistics, under the condition that all subsets of r leaves are equiprobable. Yet, so far there are no algorithms that can do this exactly when the leaves in \(\mathcal {T}\) are weighted with unequal probabilities. As a consequence, for this general setting, specialists try to compute the statistics of phylogenetic measures using methods which are both inexact and very slow.

We present for the first time exact and efficient algorithms for computing the mean and the variance of phylogenetic measures when leaf subsets of fixed size are selected from \(\mathcal {T}\) under a non-uniform random distribution. In particular, let \(\mathcal {T}\) be a tree that has n nodes and depth d, and let r be a non-negative integer. We show how to compute in \(O((d+\log n) n \log n)\) time and O(n) space the mean and the variance for any measure that belongs to a well-defined class. We show that two of the most popular phylogenetic measures belong to this class: the Phylogenetic Diversity (\(\mathrm {PD} \)) and the Mean Pairwise Distance (\(\mathrm {MPD} \)). The random distribution that we consider is the Poisson binomial distribution restricted to subsets of fixed size r. More than that, we provide a stronger result; specifically for the \(\mathrm {PD} \) and the \(\mathrm {MPD} \) we describe algorithms that compute in a batched manner the mean and variance on \(\mathcal {T}\) for all possible leaf-subset sizes in \(O((d+\log n) n \log n)\) time and O(n) space.

For the \(\mathrm {PD} \) and \(\mathrm {MPD} \), we implemented our algorithms that perform batched computations of the mean and variance. We also developed alternative implementations that compute in \(O((d+\log n) n^2)\) time the same output. For both types of implementations, we conducted experiments and measured their performance in practice. Despite the difference in the theoretical performance, we show that the algorithms that run in \(O((d+\log n) n^2)\) time are more efficient in practice, and numerically more stable. We also compared the performance of these algorithms with standard inexact methods that can be used in case studies. We show that our algorithms are outstandingly faster, making it possible to process much larger datasets than before. Our implementations will become publicly available through the R package PhyloMeasures.

MADALGO—Center for Massive Data Algorithmics, a Center of the Danish National Research Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bininda-Emonds, O.R.P., Cardillo, M., Jones, K.E., MacPhee, R.D.E., Beck, R.M.D., Grenyer, R., Price, S.A., Vos, R.A., Gittleman, J.L., Purvis, A.: The delayed rise of present-day mammals. Nature 446, 507–512 (2007)

    Article  Google Scholar 

  2. Chen, S.X., Liu, J.S.: Statistical applications of the poisson-binomial and conditional bernoulli distributions. Stat. Sin. 7, 875–892 (1997)

    MATH  MathSciNet  Google Scholar 

  3. Faller, B., Pardi, F., Steel, M.: Distribution of phylogenetic diversity under random extinction. J. Theor. Biol. 251, 286–296 (2008)

    Article  MathSciNet  Google Scholar 

  4. Goloboff, P.A., Catalano, S.A., Mirandeb, J.M., Szumika, C.A., Ariasa, J.S., Kallersjoc, M., Farris, J.S.: Phylogenetic analysis of 73 060 taxa corroborates major eukaryotic groups. Cladistics 25, 211–230 (2009)

    Article  Google Scholar 

  5. The Webpage of the International Union for Conservation of Nature. http://www.iucn.org/

  6. Kraft, N.J.B., Cornwell, W.K., Webb, C.O., Ackerly, D.A.: Trait evolution, community assembly, and the phylogenetic structure of ecological communities. Am. Nat. 170, 271–283 (2007)

    Article  Google Scholar 

  7. Van Loan, C.: Computational Frameworks for the Fast Fourier Transform, vol. 10. Siam, Philadelphia (1992)

    Book  MATH  Google Scholar 

  8. Steel, M.: Tools to construct, study big trees: A mathematical perspective. In: Hodkinson, T., Parnell, J., Waldren, S. (eds.) Reconstructing the Tree of Life: Taxonomy and Systematics of Species Rich Taxa, pp. 97–112. CRC Press, Boca Raton (2007)

    Google Scholar 

  9. Tasche, M., Zeuner, H.: Improved roundoff error analysis for precomputed twiddle factors. J. Comp. Anal. Appl. 4(1), 1–18 (2002)

    MATH  MathSciNet  Google Scholar 

  10. Tsirogiannis, C., Sandel, B.: Fast Phylogenetic Biodiversity Computations Under a Non-Uniform Random Distribution. http://www.madalgo.au.dk/~constant/abundance_model.pdf

  11. Tsirogiannis, C., Sandel, B.: PhyloMeasures: a Package for Computing Phylogenetic Biodiversity Measures and their Statistical Moments. Ecography (2015). http://dx.doi.org/10.1111/ecog.01814

  12. Tsirogiannis, C., Sandel, B., Kalvisa, A.: New algorithms for computing phylogenetic biodiversity. In: Brown, D., Morgenstern, B. (eds.) WABI 2014. LNCS, vol. 8701, pp. 187–203. Springer, Heidelberg (2014)

    Google Scholar 

  13. Webb, C.O., Ackerly, D.D., McPeek, M.A., Donoghue, M.J.: Phylogenies and community ecology. Annu. Rev. Ecol. Syst. 33, 475–505 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Constantinos Tsirogiannis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Tsirogiannis, C., Sandel, B. (2016). Fast Phylogenetic Biodiversity Computations Under a Non-uniform Random Distribution. In: Singh, M. (eds) Research in Computational Molecular Biology. RECOMB 2016. Lecture Notes in Computer Science(), vol 9649. Springer, Cham. https://doi.org/10.1007/978-3-319-31957-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31957-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31956-8

  • Online ISBN: 978-3-319-31957-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics