Skip to main content

Phylogeny Trees as a Tool to Compare Inference Algorithms of Orthologs

  • Conference paper
  • First Online:
Advances in Bioinformatics and Computational Biology (BSB 2022)

Abstract

Orthologous genes are defined as genes arising from speciation events, being highly conserved in form and function. Several algorithms seek to identify them, but a simple methodology is not available to determine the quality of their results. This work proposed using the definition of orthologs and the analysis of phylogenetic trees to develop a methodology to compare these algorithms. Thirty proteomes of prokaryotes were obtained, focusing on Leifsonia and Clavibacter genera. The orthogroups were inferred using five graph-based algorithms (OMA, Orthofinder, PorthoMCL, ProteinOrtho, and Sonic Paranoid). Frequencies of each homologous group were obtained from the resulting raw data. The sequences were aligned by MUSCLE software. After that, the sequences were trimmed by the trimAl software and concatenated into supermatrices. The percentage of information for each supermatrix was calculated. The phylogenetic trees were built applying three tree reconstruction methods: Maximum Likelihood, Bayesian inference, and Neighbors-joining. The reference trees were made by 16S ribosomal RNA sequences. Furthermore, gene trees from orthogroups with taxa = 30 were inferred by the Maximum Likelihood methodology. The trees were compared to the reference tree by topology and Robinson-Foulds distances. Despite the differences in the quantity of the orthogroups obtained from each algorithm, no significant differences were observed between the constructed trees. However, previous work with other distinct species verified that this methodology may be viable. It is concluded that the proposed methodology is valid, although not to all species groups. Due to the input data dependencies, this methodology is recommended to be performed for each new data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altenhoff, A.M., et al.: Standardized benchmarking in the quest for orthologs. Nature Methods 13(5), 425–430 (2016). https://doi.org/10.1038/nmeth.3830

    Article  CAS  Google Scholar 

  2. Altenhoff, A.M., et al.: The quest for orthologs benchmark service and consensus calls in 2020. Nucleic Acids Res. 48(1), 538–545 (2020). https://doi.org/10.1093/nar/gkaa308

    Article  CAS  Google Scholar 

  3. Altenhoff, A.M., Glover, N.M., Dessimoz, C.: Inferring orthology and paralogy. In: Anisimova, M. (ed.) Evolutionary Genomics. MMB, vol. 1910, pp. 149–175. Springer, New York (2019). https://doi.org/10.1007/978-1-4939-9074-0_5

    Chapter  Google Scholar 

  4. Altenhoff, A.M., Levy, J., et al.: OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Res. 29(7), 1152–1163 (2019). https://doi.org/10.1101/gr.243212.118

    Article  CAS  Google Scholar 

  5. Buckland, M., Gey, F.: The relationship between recall and precision. J. Am. Soc. Inf. Sci. 45(1), 12–19 (1994). https://doi.org/10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L

    Article  Google Scholar 

  6. Capella-Gutiérrez, S., Silla-Martínez, J.M., et al.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009). https://doi.org/10.1093/bioinformatics/btp348

    Article  CAS  Google Scholar 

  7. Cosentino, S., Iwasaki, W.: SonicParanoid: fast, accurate and easy orthology inference. Bioinformatics 35(1), 149–151 (2019). https://doi.org/10.1093/bioinformatics/bty631

    Article  CAS  Google Scholar 

  8. Michael, J., Davis, A., Gillaspie, G., et al.: Clavibacter: a new genus containing some phytopathogenic coryneform bacteria, including clavibacter xyli subsp. xyli sp. nov., subsp. nov. and clavibacter xyli subsp. cynodontis subsp. nov., pathogens that cause ratoon stunting disease of sugarcane and bermudagrass stunting disease. Int. J. Syst. Evol. Microbiol. 34(2), 107–117 (1984). https://doi.org/10.1099/00207713-34-2-107

    Article  Google Scholar 

  9. Dessimoz, C., et al.: OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: introduction and first achievements. In: McLysaght, A., Huson, D.H. (eds.) RCG 2005. LNCS, vol. 3678, pp. 61–72. Springer, Heidelberg (2005). https://doi.org/10.1007/11554714_6

    Chapter  Google Scholar 

  10. Deutekom, E.S., Snel, B., et al.: Benchmarking orthology methods using phylogenetic patterns defined at the base of eukaryotes. Briefings Bioinf. 22(3), bbaa206 (2021). https://doi.org/10.1093/bib/bbaa206

    Article  Google Scholar 

  11. Edgar, R.C.: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32(5), 1792–1797 (2004). https://doi.org/10.1093/nar/gkh340

    Article  CAS  Google Scholar 

  12. Emms, D.M., Kelly, S.: OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20(1), 238 (2019). https://doi.org/10.1186/s13059-019-1832-y

    Article  Google Scholar 

  13. Emms, D.M., Kelly, S.: Benchmarking orthogroup inference accuracy: revisiting orthobench. Genome Biol. Evol. 12(12), 2258–2266 (2020). https://doi.org/10.1093/gbe/evaa211

    Article  Google Scholar 

  14. Fernández, R., Gabaldón, T., Dessimoz, C., et al.: Orthology: Definitions, Inference, and Impact on Species Phylogeny Inference (2019). https://arxiv.org/abs/1903.04530

  15. Gabaldón, T., Koonin, E.V.: Functional and evolutionary implications of gene orthology. Nat. Rev. Genetics 14(5), 360–366 (2013). https://doi.org/10.1038/nrg3456

    Article  CAS  Google Scholar 

  16. Oliver Glöckner, F., Yilmaz, P., et al.: 25 years of serving the community with ribosomal RNA gene reference databases and tools. J. Biotechnol. 261, 169–176 (2017). https://doi.org/10.1016/j.jbiotec.2017.06.1198

    Article  CAS  Google Scholar 

  17. Hellmuth, M., Wieseke, N.: From sequence data including orthologs, paralogs, and xenologs to gene and species trees. In: Pontarotti, P. (ed.) Evolutionary Biology, pp. 373–392. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41324-2_21

    Chapter  Google Scholar 

  18. Kallal, R.J., Fernández, R., et al.: A phylotranscriptomic backbone of the orb-weaving spider family araneidae (Arachnida, Araneae) supported by multiple methodological approaches. Mol. Phylogenet. Evol. 126, 129–140 (2018). https://doi.org/10.1016/j.ympev.2018.04.007

    Article  Google Scholar 

  19. Kumar, S., Stecher, G., et al.: MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35(6), 1547–1549 (2018). https://doi.org/10.1093/molbev/msy096

    Article  CAS  Google Scholar 

  20. Lechner, M., Findeiß, S., Steiner, L., et al.: Proteinortho: detection of (Co-)orthologs in large-scale analysis. BMC Bioinf. 12(1), 124 (2011). https://doi.org/10.1186/1471-2105-12-124

    Article  Google Scholar 

  21. Li, L., Stoeckert, C.J., et al.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003). https://doi.org/10.1101/gr.1224503

    Article  CAS  Google Scholar 

  22. Nichio, B.T., Marchaukoski, J.N., Raittz, R.T.: New tools in orthology analysis: a brief review of promising perspectives. Frontiers Genet. 8, 165 (2017). https://doi.org/10.3389/fgene.2017.00165

    Article  CAS  Google Scholar 

  23. Nordstedt, N.P., Roman-Reyna, V., et al.: Comparative genomic understanding of gram-positive plant growth-promoting leifsonia. Phytobiomes J. 5(3), 263–274 (2021). https://doi.org/10.1094/PBIOMES-12-20-0092-SC

    Article  Google Scholar 

  24. Overbeek, R., Fonstein, M., D’souza, M., et al.: The use of gene clusters to infer functional coupling. In: Proceedings of the National Academy of Sciences, vol. 96, no. 6, pp. 2896–2901 (1999). https://doi.org/10.1073/pnas.96.6.2896

  25. Philippe, H., Brinkmann, H., et al.: Resolving difficult phylogenetic questions: why more sequences are not enough. PLOS Biol. 9(3), e1000602 (2011). https://doi.org/10.1371/journal.pbio.1000602

    Article  CAS  Google Scholar 

  26. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2

    Article  Google Scholar 

  27. Ronquist, F., Teslenko, M., et al.: MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012). https://doi.org/10.1093/sysbio/sys029

    Article  Google Scholar 

  28. Shen, X.X., Opulente, D.A., et al.: Tempo and mode of genome evolution in the budding yeast subphylum. Cell 175(6), 1533-1545.e20 (2018). https://doi.org/10.1016/j.cell.2018.10.023

    Article  CAS  Google Scholar 

  29. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014). https://doi.org/10.1093/bioinformatics/btu033

    Article  CAS  Google Scholar 

  30. Tabari, E., Zhengchang, S.: PorthoMCL: parallel orthology prediction using MCL for the realm of massive genome availability. BigData Analytics 2, 4 (2017). https://doi.org/10.1186/s41044-016-0019-8

    Article  Google Scholar 

  31. Trachana, K., Larsson, S.P., et al.: Orthology prediction methods: a quality assessment using curated protein families. Bioessays 33(10), 769–780 (2011). https://doi.org/10.1002/bies.201100062

    Article  CAS  Google Scholar 

  32. Landaburu, L., Berenstein, A., Videla, S., et al.: TDR Targets 6: driving drug discovery for human pathogens through intensive chemogenomic data integration. Nucleic Acids Res. 48(D1), D992–D1005 (2020). https://doi.org/10.1093/nar/gkz999

    Article  CAS  Google Scholar 

  33. Wall, D.P., Fraser, H.B., Hirsh, A.E.: Detecting putative orthologs. Bioinformatics 19(13), 1710–1711 (2003). https://doi.org/10.1093/bioinformatics/btg213

    Article  CAS  Google Scholar 

  34. Yoshida, R., Nei, M.: Efficiencies of the NJp, maximum likelihood, and bayesian methods of phylogenetic construction for compositional and noncompositional genes. Mol. Biol. Evol. 33(6), 1618–1624 (2016). https://doi.org/10.1093/molbev/msw042

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernanda Nascimento Almeida .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oliveira, R., de Castro Leite, S., Almeida, F.N. (2022). Phylogeny Trees as a Tool to Compare Inference Algorithms of Orthologs. In: Scherer, N.M., de Melo-Minardi, R.C. (eds) Advances in Bioinformatics and Computational Biology. BSB 2022. Lecture Notes in Computer Science(), vol 13523. Springer, Cham. https://doi.org/10.1007/978-3-031-21175-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21175-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21174-4

  • Online ISBN: 978-3-031-21175-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics