Skip to main content

Phylogenetic Placement Problem: A Hyperbolic Embedding Approach

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13234))

Abstract

Phylogenetic trees define a metric space over their vertices, an observation that underlines distance-based phylogenetic inference. Several authors, including Layer and Rhodes (2017), have noted that we can embed leaves of a phylogenetic tree into high-dimensional Euclidean spaces in such a way that it minimizes the distortion of the tree distances. Jiang et al. (2021) use a deep learning approach to build a mapping from the space of sequences to the Euclidean space such that the mapped sequences accurately preserve the leaf distances on a given tree. Their tool, DEPP, uses this map to place a new query sequence onto the tree by first embedding it, an idea that was particularly promising for updating a species tree given data from a single gene despite the potential discordance of the gene tree and the species tree. In focusing on Euclidean spaces, these recent papers have ignored the strong theory that suggests hyperbolic spaces are more appropriate for embedding vertices of a tree. In this paper, we show that by moving to hyperbolic spaces and addressing challenges related to non-linearity and precision, we can reduce the distortion of distances for any given number of dimensions. The distortion of distances obtained using hyperbolic embeddings is lower than Euclidean embeddings with the same number of dimensions, both in training (backbone) and testing (query). The low-distortion distances of embeddings result in better topological accuracy in updating species trees using a single gene compared to its Euclidean counterpart. It also improves accuracy in placing queries for some datasets but not all.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2–3), 251–278 (1999). https://doi.org/10.1007/PL00008277

    Article  MathSciNet  MATH  Google Scholar 

  2. Bachmann, G., Bécigneul, G., Ganea, O.: Constant curvature graph convolutional networks. In: International Conference on Machine Learning, pp. 486–496. PMLR (2020)

    Google Scholar 

  3. Balaban, M., Jiang, Y., Roush, D., Zhu, Q., Mirarab, S.: Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. (2021). https://doi.org/10.1111/1755-0998.13527. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13527

  4. Balaban, M., Sarmashghi, S., Mirarab, S.: APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3), 566–578 (2020). https://doi.org/10.1093/sysbio/syz063. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz063/5572672. https://academic.oup.com/sysbio/article/69/3/566/5572672

  5. Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001). https://doi.org/10.1006/aama.2001.0759

    Article  MathSciNet  MATH  Google Scholar 

  6. Chami, I., Ying, Z., Ré, C., Leskovec, J.: Hyperbolic graph convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  7. Chen, W., et al.: Fully hyperbolic neural networks. arXiv preprint arXiv:2105.14686 (2021)

  8. Corso, G., Ying, Z., Pándy, M., Veličković, P., Leskovec, J., Liò, P.: Neural distance embeddings for biological sequences. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  9. Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967). https://doi.org/10.1126/science.155.3760.279. https://www.science.org/doi/10.1126/science.155.3760.279

  10. Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic entailment cones for learning hierarchical embeddings. In: International Conference on Machine Learning, pp. 1646–1655. PMLR (2018)

    Google Scholar 

  11. Gascuel, O., Steel, M.: A ‘stochastic safety radius’ for distance-based tree reconstruction. Algorithmica 74(4), 1386–1403 (2016). https://doi.org/10.1007/s00453-015-0005-y. http://link.springer.com/10.1007/s00453-015-0005-y

  12. Jiang, Y., Balaban, M., Zhu, Q., Mirarab, S.: DEPP: deep learning enables extending species trees using single genes. bioRxiv (abstract in RECOMB 2021) (2021). https://doi.org/10.1101/2021.01.22.427808. http://biorxiv.org/content/early/2021/01/24/2021.01.22.427808.abstract

  13. Lagesen, K., Hallin, P., Rødland, E.A., Stærfeldt, H.H., Rognes, T., Ussery, D.W.: RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35(9), 3100–3108 (2007)

    Article  Google Scholar 

  14. Layer, M., Rhodes, J.A.: Phylogenetic trees and Euclidean embeddings. J. Math. Biol. 74(1–2), 99–111 (2017). https://doi.org/10.1007/s00285-016-1018-0

    Article  MathSciNet  MATH  Google Scholar 

  15. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150. http://mbe.oxfordjournals.org/lookup/doi/10.1093/molbev/msv150

  16. Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2), 215–245 (1995). https://doi.org/10.1007/BF01200757

    Article  MathSciNet  MATH  Google Scholar 

  17. Liu, Q., Nickel, M., Kiela, D.: Hyperbolic graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  18. Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). https://doi.org/10.1093/sysbio/syv082. http://sysbio.oxfordjournals.org/content/early/2015/12/04/sysbio.syv082.short?rss=1. https://academic.oup.com/sysbio/article-lookup/doi/10.1093/sysbio/syv082. http://www.ncbi.nlm.nih.gov/pubmed/265

  19. Matsumoto, H., Mimori, T., Fukunaga, T.: Novel metric for hyperbolic phylogenetic tree embeddings. Biol. Methods Protoc. 6(1), bpab006 (2021)

    Article  Google Scholar 

  20. Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015). https://doi.org/10.1093/bioinformatics/btv234. http://bioinformatics.oxfordjournals.org/cgi/content/long/31/12/i44. http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv234

  21. Nguyen, N.P.D., Mirarab, S., Kumar, K., Warnow, T.: Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16(1), 124 (2015). https://doi.org/10.1186/s13059-015-0688-z. http://genomebiology.com/2015/16/1/124. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0688-z

  22. Parks, D.H., et al.: A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36(10), 996–1004 (2018). https://doi.org/10.1038/nbt.4229. http://www.nature.com/articles/nbt.4229

  23. Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981). http://www.sciencedirect.com/science/article/pii/0025556481900432

  24. Sala, F., De Sa, C., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic embeddings. In: International Conference on Machine Learning, pp. 4460–4469. PMLR (2018)

    Google Scholar 

  25. Sarkar, R.: Low distortion delaunay embedding of trees in hyperbolic plane. In: van Kreveld, M., Speckmann, B. (eds.) GD 2011. LNCS, vol. 7034, pp. 355–366. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25878-7_34

    Chapter  Google Scholar 

  26. Shimizu, R., Mukuta, Y., Harada, T.: Hyperbolic neural networks++. arXiv preprint arXiv:2006.08210 (2020)

  27. Skopek, O., Ganea, O.E., Bécigneul, G.: Mixed-curvature variational autoencoders (2020)

    Google Scholar 

  28. Tabaghi, P., Dokmanić, I.: Hyperbolic distance matrices. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1728–1738 (2020)

    Google Scholar 

  29. Tabaghi, P., Peng, J., Milenkovic, O., Dokmanić, I.: Geometry of similarity comparisons. arXiv preprint arXiv:2006.09858 (2020)

  30. de Vienne, D.M., Ollier, S., Aguileta, G.: Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol. Biol. Evol. 29(6), 1587–1598 (2012). https://doi.org/10.1093/molbev/msr317. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msr317

  31. Zhu, Q., et al.: Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10(1), 5477 (2019). https://doi.org/10.1038/s41467-019-13443-4. http://www.nature.com/articles/s41467-019-13443-4

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siavash Mirarab .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, Y., Tabaghi, P., Mirarab, S. (2022). Phylogenetic Placement Problem: A Hyperbolic Embedding Approach. In: Jin, L., Durand, D. (eds) Comparative Genomics. RECOMB-CG 2022. Lecture Notes in Computer Science(), vol 13234. Springer, Cham. https://doi.org/10.1007/978-3-031-06220-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06220-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06219-3

  • Online ISBN: 978-3-031-06220-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics