Abstract
Prokaryotic evolution is often described as the Spaghetti of Life due to massive genome dynamics (GD) events of gene gain and loss, resulting in different evolutionary histories for the set of genes comprising the organism. These different histories, dubbed as gene trees provide confounding signals, hampering the attempt to reconstruct the species tree describing the main trend of evolution of the species under study. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing comparison of unequal gene content genomes, together with order considerations of their common genes. Recently, GD has been modelled as a continuous-time Markov process. Under this formulation, the distance between genes along the chromosome was shown to follow a birth-death-immigration process. Using classical results from birth-death theory, we recently showed that the SI measure is consistent under that formulation. In this work, we provide an alternative, stand alone combinatorial proof of the same result. By using generating function techniques we derive explicit expressions of the system’s probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically the expected distances between organisms based on a transformation of their SI. Although the expressions obtained are rather complex, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). This approach relies on holonomic functions and the Zeilberger Algorithm in order to establish additivity of the transformation of SI.
SS was supported by the Israel Science Foundation (grant No. ISF 1927/21) and the by the American/Israeli Binational Science Foundation (grant no. BSF 2021139).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adato, O., Ninyo, N., Gophna, U., Snir, S.: Detecting horizontal gene transfer between closely related taxa. PLoS Comput. Biol. 11(10), e1004408 (2015)
Allen, L.J.: An Introduction to Stochastic Processes with Applications to Biology. Chapman and Hall/CRC, Boca Raton (2010)
Anderson, W.J.: Continuous-Time Markov Chains: An Applications-Oriented Approach. Springer, Cham (2012). https://doi.org/10.1007/978-1-4612-3038-0
Biller, P., Guéguen, L., Tannier, E.: Moments of genome evolution by double cut-and-join. BMC Bioinform. 16(14), S7 (2015)
Chor, B., Hendy, M.D., Snir, S.: Maximum likelihood jukes-cantor triplets: analytic solutions. Mol. Biol. Evol. 23(3), 626–632 (2006)
Chor, B., Khetan, A., Snir, S.: Maximum likelihood on four taxa phylogenetic trees: analytic solutions. In: Proceedings of the Seventh annual International Conference on Computational Molecular Biology (RECOMB), Berlin, Germany, April 2003, pp. 76–83 (2003)
Doolittle, W.F.: Phylogenetic classification and the universal tree. Science 284(5423), 2124–2128 (1999)
Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27(4), 401–410 (1978)
Felsenstein, J.: Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17(6), 368–376 (1981)
Fitz Gibbon, S.T., House, C.H.: Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27(21), 4218–4222 (1999)
Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals, 46, 1–27 (1999). ACM
Hendy, M.D., Penny, D.: A framework for the quantitative study of evolutionary trees. Syst. Zool. 38(4), 297–309 (1989)
Hendy, M.D., Penny, D.: Spectral analysis of phylogenetic data. J. Classif. 10(1), 5–24 (1993)
Hendy, M.D., Penny, D., Steel, M.: A discrete Fourier analysis for evolutionary trees. Proc. Natl. Acad. Sci. 91(8), 3339–3343 (1994)
Jaccard, P.: Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull. Soc. Vaudoise Sci. Nat. 37, 547–579 (1901)
Karlin, S., McGregor, J.: The classification of birth and death processes. Trans. Am. Math. Soc. 86(2), 366–400 (1957)
Karlin, S., McGregor, J.: A characterization of birth and death processes. Proc. Natl. Acad. Sci. 45(3), 375–379 (1959)
Karlin, S., McGregor, J.L.: The differential equations of birth-and-death processes, and the stieltjes moment problem. Trans. Am. Math. Soc. 85(2), 489–546 (1957)
Katriel, G., et al.: Gene transfer-based phylogenetics: analytical expressions and additivity via birth–death theory. System. Biol. (2023, accepted)
Koonin, E.V., Makarova, K.S., Aravind, L.: Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. 55(1), 709–742 (2001)
Koutschan, C.: HolonomicFunctions (user’s guide). Technical report 10-01, RISC Report Series, Johannes Kepler University, Linz, Austria (2010). https://www.risc.jku.at/research/combinat/software/HolonomicFunctions/
Miller, S.: The Probability Lifesaver: All the Tools You Need to Understand Chance. Princeton Lifesaver Study Guides, Princeton University Press (2017). https://books.google.co.il/books?id=VwtHvgAACAAJ
Ochman, H., Lawrence, J.G., Groisman, E.A.: Lateral gene transfer and the nature of bacterial innovation. Nature 405(6784), 299 (2000)
Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-56024-6_10
Sankoff, D., Nadeau, J.H.: Conserved synteny as a measure of genomic distance. Discret. Appl. Math. 71(1–3), 247–257 (1996)
Serdoz, S., et al.: Maximum likelihood estimates of pairwise rearrangement distances. J. Theor. Biol. 423, 31–40 (2017)
Sevillya, G., Doerr, D., Lerner, Y., Stoye, J., Steel, M., Snir, S.: Horizontal gene transfer phylogenetics: a random walk approach. Mol. Biol. Evol. 37(5), 1470–1479 (2019). https://doi.org/10.1093/molbev/msz302
Shifman, A., Ninyo, N., Gophna, U., Snir, S.: Phylo SI: a new genome-wide approach for prokaryotic phylogeny. Nucleic Acids Res. 42(4), 2391–2404 (2013)
Snel, B., Bork, P., Huynen, M.A.: Genome phylogeny based on gene content. Nat. Genet. 21(1), 108 (1999)
Tekaia, F., Dujon, B.: Pervasiveness of gene conservation and persistence of duplicates in cellular genomes. J. Mol. Evol. 49(5), 591–600 (1999)
Wang, L.S., Warnow, T.: Estimating true evolutionary distances between genomes. In: Proceedings of the Thirty-Third Annual ACM Symposium on Theory of Computing, pp. 637–646. ACM (2001)
Wilf, H.S., Zeilberger, D.: An algorithmic proof theory for hypergeometric (ordinary and “\(q\)’’) multisum/integral identities. Invent. Math. 108(1), 575–633 (1992)
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)
Zeilberger, D.: A fast algorithm for proving terminating hypergeometric identities. Discret. Math. 80(2), 207–211 (1990). https://doi.org/10.1016/0012-365X(90)90120-7
Zeilberger, D.: A holonomic systems approach to special functions identities. J. Comput. Appl. Math. 32(3), 321–368 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Katriel, G., Mahanaymi, U., Koutschan, C., Zeilberger, D., Steel, M., Snir, S. (2023). Using Generating Functions to Prove Additivity of Gene-Neighborhood Based Phylogenetics - Extended Abstract. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)