Abstract
Orthologous genes are defined as genes arising from speciation events, being highly conserved in form and function. Several algorithms seek to identify them, but a simple methodology is not available to determine the quality of their results. This work proposed using the definition of orthologs and the analysis of phylogenetic trees to develop a methodology to compare these algorithms. Thirty proteomes of prokaryotes were obtained, focusing on Leifsonia and Clavibacter genera. The orthogroups were inferred using five graph-based algorithms (OMA, Orthofinder, PorthoMCL, ProteinOrtho, and Sonic Paranoid). Frequencies of each homologous group were obtained from the resulting raw data. The sequences were aligned by MUSCLE software. After that, the sequences were trimmed by the trimAl software and concatenated into supermatrices. The percentage of information for each supermatrix was calculated. The phylogenetic trees were built applying three tree reconstruction methods: Maximum Likelihood, Bayesian inference, and Neighbors-joining. The reference trees were made by 16S ribosomal RNA sequences. Furthermore, gene trees from orthogroups with taxa = 30 were inferred by the Maximum Likelihood methodology. The trees were compared to the reference tree by topology and Robinson-Foulds distances. Despite the differences in the quantity of the orthogroups obtained from each algorithm, no significant differences were observed between the constructed trees. However, previous work with other distinct species verified that this methodology may be viable. It is concluded that the proposed methodology is valid, although not to all species groups. Due to the input data dependencies, this methodology is recommended to be performed for each new data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altenhoff, A.M., et al.: Standardized benchmarking in the quest for orthologs. Nature Methods 13(5), 425–430 (2016). https://doi.org/10.1038/nmeth.3830
Altenhoff, A.M., et al.: The quest for orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res. 48(1), 538–545 (2020). https://doi.org/10.1093/nar/gkaa308
Altenhoff, A.M., Glover, N.M., Dessimoz, C.: Inferring orthology and paralogy. In: Anisimova, M. (ed.) Evolutionary Genomics. MMB, vol. 1910, pp. 149–175. Springer, New York (2019). https://doi.org/10.1007/978-1-4939-9074-0_5
Altenhoff, A.M., Levy, J., et al.: OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Res. 29(7), 1152–1163 (2019). https://doi.org/10.1101/gr.243212.118
Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994). https://doi.org/10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L
Capella-Gutiérrez, S., Silla-Martínez, J.M., et al.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009). https://doi.org/10.1093/bioinformatics/btp348
Cosentino, S., Iwasaki, W.: SonicParanoid: fast, accurate and easy orthology inference. Bioinformatics 35(1), 149–151 (2019). https://doi.org/10.1093/bioinformatics/bty631
Michael, J., Davis, A., Gillaspie, G., et al.: Clavibacter: a new genus containing some phytopathogenic coryneform bacteria, including clavibacter xyli subsp. xyli sp. nov., subsp. nov. and clavibacter xyli subsp. cynodontis subsp. nov., pathogens that cause ratoon stunting disease of sugarcane and bermudagrass stunting disease. Int. J. Syst. Evol. Microbiol. 34(2), 107–117 (1984). https://doi.org/10.1099/00207713-34-2-107
Dessimoz, C., et al.: OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RCG 2005. LNCS, vol. 3678, pp. 61–72. Springer, Heidelberg (2005). https://doi.org/10.1007/11554714_6
Deutekom, E.S., Snel, B., et al.: Benchmarking orthology methods using phylogenetic patterns defined at the base of eukaryotes. Briefings Bioinf. 22(3), bbaa206 (2021). https://doi.org/10.1093/bib/bbaa206
Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004). https://doi.org/10.1093/nar/gkh340
Emms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20(1), 238 (2019). https://doi.org/10.1186/s13059-019-1832-y
Emms, D.M., Kelly, S.: Benchmarking orthogroup inference accuracy: revisiting orthobench. Genome Biol. Evol. 12(12), 2258–2266 (2020). https://doi.org/10.1093/gbe/evaa211
Fernández, R., Gabaldón, T., Dessimoz, C., et al.: Orthology: Definitions, Inference, and Impact on Species Phylogeny Inference (2019). https://arxiv.org/abs/1903.04530
Gabaldón, T., Koonin, E.V.: Functional and evolutionary implications of gene orthology. Nat. Rev. Genetics 14(5), 360–366 (2013). https://doi.org/10.1038/nrg3456
Oliver Glöckner, F., Yilmaz, P., et al.: 25 years of serving the community with ribosomal RNA gene reference databases and tools. J. Biotechnol. 261, 169–176 (2017). https://doi.org/10.1016/j.jbiotec.2017.06.1198
Hellmuth, M., Wieseke, N.: From sequence data including orthologs, paralogs, and xenologs to gene and species trees. In: Pontarotti, P. (ed.) Evolutionary Biology, pp. 373–392. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41324-2_21
Kallal, R.J., Fernández, R., et al.: A phylotranscriptomic backbone of the orb-weaving spider family araneidae (Arachnida, Araneae) supported by multiple methodological approaches. Mol. Phylogenet. Evol. 126, 129–140 (2018). https://doi.org/10.1016/j.ympev.2018.04.007
Kumar, S., Stecher, G., et al.: MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35(6), 1547–1549 (2018). https://doi.org/10.1093/molbev/msy096
Lechner, M., Findeiß, S., Steiner, L., et al.: Proteinortho: detection of (Co-)orthologs in large-scale analysis. BMC Bioinf. 12(1), 124 (2011). https://doi.org/10.1186/1471-2105-12-124
Li, L., Stoeckert, C.J., et al.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003). https://doi.org/10.1101/gr.1224503
Nichio, B.T., Marchaukoski, J.N., Raittz, R.T.: New tools in orthology analysis: a brief review of promising perspectives. Frontiers Genet. 8, 165 (2017). https://doi.org/10.3389/fgene.2017.00165
Nordstedt, N.P., Roman-Reyna, V., et al.: Comparative genomic understanding of gram-positive plant growth-promoting leifsonia. Phytobiomes J. 5(3), 263–274 (2021). https://doi.org/10.1094/PBIOMES-12-20-0092-SC
Overbeek, R., Fonstein, M., D’souza, M., et al.: The use of gene clusters to infer functional coupling. In: Proceedings of the National Academy of Sciences, vol. 96, no. 6, pp. 2896–2901 (1999). https://doi.org/10.1073/pnas.96.6.2896
Philippe, H., Brinkmann, H., et al.: Resolving difficult phylogenetic questions: why more sequences are not enough. PLOS Biol. 9(3), e1000602 (2011). https://doi.org/10.1371/journal.pbio.1000602
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2
Ronquist, F., Teslenko, M., et al.: MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012). https://doi.org/10.1093/sysbio/sys029
Shen, X.X., Opulente, D.A., et al.: Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175(6), 1533-1545.e20 (2018). https://doi.org/10.1016/j.cell.2018.10.023
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014). https://doi.org/10.1093/bioinformatics/btu033
Tabari, E., Zhengchang, S.: PorthoMCL: parallel orthology prediction using MCL for the realm of massive genome availability. BigData Analytics 2, 4 (2017). https://doi.org/10.1186/s41044-016-0019-8
Trachana, K., Larsson, S.P., et al.: Orthology prediction methods: a quality assessment using curated protein families. Bioessays 33(10), 769–780 (2011). https://doi.org/10.1002/bies.201100062
Landaburu, L., Berenstein, A., Videla, S., et al.: TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration. Nucleic Acids Res. 48(D1), D992–D1005 (2020). https://doi.org/10.1093/nar/gkz999
Wall, D.P., Fraser, H.B., Hirsh, A.E.: Detecting putative orthologs. Bioinformatics 19(13), 1710–1711 (2003). https://doi.org/10.1093/bioinformatics/btg213
Yoshida, R., Nei, M.: Efficiencies of the NJp, maximum likelihood, and bayesian methods of phylogenetic construction for compositional and noncompositional genes. Mol. Biol. Evol. 33(6), 1618–1624 (2016). https://doi.org/10.1093/molbev/msw042
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oliveira, R., de Castro Leite, S., Almeida, F.N. (2022). Phylogeny Trees as a Tool to Compare Inference Algorithms of Orthologs. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-21175-1_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21174-4
Online ISBN: 978-3-031-21175-1
eBook Packages: Computer ScienceComputer Science (R0)