Abstract
Phylogenetic trees define a metric space over their vertices, an observation that underlines distance-based phylogenetic inference. Several authors, including Layer and Rhodes (2017), have noted that we can embed leaves of a phylogenetic tree into high-dimensional Euclidean spaces in such a way that it minimizes the distortion of the tree distances. Jiang et al. (2021) use a deep learning approach to build a mapping from the space of sequences to the Euclidean space such that the mapped sequences accurately preserve the leaf distances on a given tree. Their tool, DEPP, uses this map to place a new query sequence onto the tree by first embedding it, an idea that was particularly promising for updating a species tree given data from a single gene despite the potential discordance of the gene tree and the species tree. In focusing on Euclidean spaces, these recent papers have ignored the strong theory that suggests hyperbolic spaces are more appropriate for embedding vertices of a tree. In this paper, we show that by moving to hyperbolic spaces and addressing challenges related to non-linearity and precision, we can reduce the distortion of distances for any given number of dimensions. The distortion of distances obtained using hyperbolic embeddings is lower than Euclidean embeddings with the same number of dimensions, both in training (backbone) and testing (query). The low-distortion distances of embeddings result in better topological accuracy in updating species trees using a single gene compared to its Euclidean counterpart. It also improves accuracy in placing queries for some datasets but not all.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Atteson, K.: The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25(2–3), 251–278 (1999). https://doi.org/10.1007/PL00008277
Bachmann, G., Bécigneul, G., Ganea, O.: Constant curvature graph convolutional networks. In: International Conference on Machine Learning, pp. 486–496. PMLR (2020)
Balaban, M., Jiang, Y., Roush, D., Zhu, Q., Mirarab, S.: Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. (2021). https://doi.org/10.1111/1755-0998.13527. https://onlinelibrary.wiley.com/doi/10.1111/1755-0998.13527
Balaban, M., Sarmashghi, S., Mirarab, S.: APPLES: scalable distance-based phylogenetic placement with or without alignments. Syst. Biol. 69(3), 566–578 (2020). https://doi.org/10.1093/sysbio/syz063. https://academic.oup.com/sysbio/advance-article/doi/10.1093/sysbio/syz063/5572672. https://academic.oup.com/sysbio/article/69/3/566/5572672
Billera, L.J., Holmes, S.P., Vogtmann, K.: Geometry of the space of phylogenetic trees. Adv. Appl. Math. 27(4), 733–767 (2001). https://doi.org/10.1006/aama.2001.0759
Chami, I., Ying, Z., Ré, C., Leskovec, J.: Hyperbolic graph convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Chen, W., et al.: Fully hyperbolic neural networks. arXiv preprint arXiv:2105.14686 (2021)
Corso, G., Ying, Z., Pándy, M., Veličković, P., Leskovec, J., Liò, P.: Neural distance embeddings for biological sequences. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155(3760), 279–284 (1967). https://doi.org/10.1126/science.155.3760.279. https://www.science.org/doi/10.1126/science.155.3760.279
Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic entailment cones for learning hierarchical embeddings. In: International Conference on Machine Learning, pp. 1646–1655. PMLR (2018)
Gascuel, O., Steel, M.: A ‘stochastic safety radius’ for distance-based tree reconstruction. Algorithmica 74(4), 1386–1403 (2016). https://doi.org/10.1007/s00453-015-0005-y. http://link.springer.com/10.1007/s00453-015-0005-y
Jiang, Y., Balaban, M., Zhu, Q., Mirarab, S.: DEPP: deep learning enables extending species trees using single genes. bioRxiv (abstract in RECOMB 2021) (2021). https://doi.org/10.1101/2021.01.22.427808. http://biorxiv.org/content/early/2021/01/24/2021.01.22.427808.abstract
Lagesen, K., Hallin, P., Rødland, E.A., Stærfeldt, H.H., Rognes, T., Ussery, D.W.: RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35(9), 3100–3108 (2007)
Layer, M., Rhodes, J.A.: Phylogenetic trees and Euclidean embeddings. J. Math. Biol. 74(1–2), 99–111 (2017). https://doi.org/10.1007/s00285-016-1018-0
Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150. http://mbe.oxfordjournals.org/lookup/doi/10.1093/molbev/msv150
Linial, N., London, E., Rabinovich, Y.: The geometry of graphs and some of its algorithmic applications. Combinatorica 15(2), 215–245 (1995). https://doi.org/10.1007/BF01200757
Liu, Q., Nickel, M., Kiela, D.: Hyperbolic graph neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). https://doi.org/10.1093/sysbio/syv082. http://sysbio.oxfordjournals.org/content/early/2015/12/04/sysbio.syv082.short?rss=1. https://academic.oup.com/sysbio/article-lookup/doi/10.1093/sysbio/syv082. http://www.ncbi.nlm.nih.gov/pubmed/265
Matsumoto, H., Mimori, T., Fukunaga, T.: Novel metric for hyperbolic phylogenetic tree embeddings. Biol. Methods Protoc. 6(1), bpab006 (2021)
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015). https://doi.org/10.1093/bioinformatics/btv234. http://bioinformatics.oxfordjournals.org/cgi/content/long/31/12/i44. http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv234
Nguyen, N.P.D., Mirarab, S., Kumar, K., Warnow, T.: Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16(1), 124 (2015). https://doi.org/10.1186/s13059-015-0688-z. http://genomebiology.com/2015/16/1/124. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0688-z
Parks, D.H., et al.: A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36(10), 996–1004 (2018). https://doi.org/10.1038/nbt.4229. http://www.nature.com/articles/nbt.4229
Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981). http://www.sciencedirect.com/science/article/pii/0025556481900432
Sala, F., De Sa, C., Gu, A., Ré, C.: Representation tradeoffs for hyperbolic embeddings. In: International Conference on Machine Learning, pp. 4460–4469. PMLR (2018)
Sarkar, R.: Low distortion delaunay embedding of trees in hyperbolic plane. In: van Kreveld, M., Speckmann, B. (eds.) GD 2011. LNCS, vol. 7034, pp. 355–366. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25878-7_34
Shimizu, R., Mukuta, Y., Harada, T.: Hyperbolic neural networks++. arXiv preprint arXiv:2006.08210 (2020)
Skopek, O., Ganea, O.E., Bécigneul, G.: Mixed-curvature variational autoencoders (2020)
Tabaghi, P., Dokmanić, I.: Hyperbolic distance matrices. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1728–1738 (2020)
Tabaghi, P., Peng, J., Milenkovic, O., Dokmanić, I.: Geometry of similarity comparisons. arXiv preprint arXiv:2006.09858 (2020)
de Vienne, D.M., Ollier, S., Aguileta, G.: Phylo-MCOA: a fast and efficient method to detect outlier genes and species in phylogenomics using multiple co-inertia analysis. Mol. Biol. Evol. 29(6), 1587–1598 (2012). https://doi.org/10.1093/molbev/msr317. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msr317
Zhu, Q., et al.: Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10(1), 5477 (2019). https://doi.org/10.1038/s41467-019-13443-4. http://www.nature.com/articles/s41467-019-13443-4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jiang, Y., Tabaghi, P., Mirarab, S. (2022). Phylogenetic Placement Problem: A Hyperbolic Embedding Approach. In: Jin, L., Durand, D. (eds) Comparative Genomics. RECOMB-CG 2022. Lecture Notes in Computer Science(), vol 13234. Springer, Cham. https://doi.org/10.1007/978-3-031-06220-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-06220-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06219-3
Online ISBN: 978-3-031-06220-9
eBook Packages: Computer ScienceComputer Science (R0)