Skip to main content

Reducing the Impact of Domain Rearrangement on Sequence Alignment and Phylogeny Reconstruction

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 14248))

Included in the following conference series:

  • 586 Accesses

Abstract

Existing computational approaches for studying gene family evolution generally do not account for domain rearrangement within gene families. However, it is well known that protein domain architectures often differ between genes belonging to the same gene family. In particular, domain shuffling can lead to out-of-order domains which, unless explicitly accounted for, can significantly impact even the most fundamental of tasks such as multiple sequence alignment and phylogeny inference.

In this work, we make progress towards addressing this important but often overlooked problem. Specifically, we (i) demonstrate the impact of protein domain shuffling and rearrangement on multiple sequence alignment and gene tree reconstruction accuracy, (ii) propose two new computational methods for correcting gene sequences and alignments for improved gene tree reconstruction accuracy and evaluate them using realistically simulated datasets, and (iii) assess the potential impact of our new methods and of two existing approaches, MDAT and ProDA, in practice by applying them to biological gene families. We find that the methods work very well on simulated data but that performance of all methods is mixed, and often complementary, on real biological data, with different methods helping improve different subsets of gene families.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ProDA and ProDA_50 could only be run successfully on 183 and 191 gene families, resp.

References

  1. Krogh, A., Eddy, S., Durbin, R.M.: Biological Sequence Analysis. Cambridge University Press, Probabilistic Models of Proteins and Nucleic Acids (1998)

    Google Scholar 

  2. Baker, E.P., et al.: Evolution of host-microbe cell adherence by receptor domain shuffling. Elife 11 (2022)

    Google Scholar 

  3. Bansal, M.S., Kellis, M., Kordi, M., Kundu, S.: RANGER-DTL 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 34(18), 3214–3216 (2018)

    Google Scholar 

  4. Björklund, A.K., Ekman, D., Light, S., Frey-Skött, J., Elofsson, A.: Domain rearrangements in protein evolution. J. Mol. Biol. 353(4), 911–923 (2005)

    Article  PubMed  Google Scholar 

  5. Blum, M., Chang, H.Y., Chuguransky, S., et al.: The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49(D1), D344–D354 (2020)

    Google Scholar 

  6. Choudhuri, S.: Chapter 2 - fundamentals of molecular evolution. In: Choudhuri, S. (ed.) Bioinformatics for Beginners, pp. 27–53. Academic Press, Oxford (2014)

    Google Scholar 

  7. Cohen-Gihon, I., Sharan, R., Nussinov, R.: Processes of fungal proteome evolution and gain of function: gene duplication and domain rearrangement. Phys. Biol. 8(3), 035009 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Dohmen, E., Klasberg, S., Bornberg-Bauer, E., Perrey, S., Kemena, C.: The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol. Biol. 20(1), 30 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)

    Article  PubMed  PubMed Central  Google Scholar 

  10. Ekman, D., Björklund, Å.K., Frey-Skött, J., Elofsson, A.: Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mole. Biol. 348(1), 231–243 (2005)

    Article  CAS  Google Scholar 

  11. Forslund, K., Sonnhammer, E.L.L.: Evolution of protein domain architectures. In: Anisimova, M. (ed.) Evolutionary Genomics: Statistical and Computational Methods, vol. 2, pp. 187–216. Humana Press, Totowa, NJ (2012)

    Chapter  Google Scholar 

  12. Han, J.H., Batey, S., Nickson, A.A., Teichmann, S.A., Clarke, J.: The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell Biol. 8, 319–330 (2007)

    Article  CAS  PubMed  Google Scholar 

  13. Kemena, C., Bitard-Feildel, T., Bornberg-Bauer, E.: MDAT- aligning multiple domain arrangements. BMC Bioinform. 16, 19 (2015)

    Article  Google Scholar 

  14. Kundu, S., Bansal, M.S.: SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution. Bioinformatics 35(18), 3496–3498 (2019)

    Article  CAS  PubMed  Google Scholar 

  15. Le, S.Q., Gascuel, O.: An improved general amino acid replacement matrix. Mol. Biol. Evol. 25(7), 1307–1320 (2008)

    Article  CAS  PubMed  Google Scholar 

  16. Li, L., Bansal, M.S.: An integrated reconciliation framework for domain, gene, and species level evolution. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(1), 63–76 (2019)

    Article  CAS  PubMed  Google Scholar 

  17. Marsh, J.A., Teichmann, S.A.: How do proteins gain new domains? Genome Biol. 11(7), 126 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  18. Mi, H., Thomas, P.: PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol. Biol. 563, 123–140 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mistry, J., et al.: Pfam: the protein families database in 2021. Nucleic Acids Res. 49(D1), D412–D419 (2021)

    Article  CAS  PubMed  Google Scholar 

  20. Miyata, T., Suga, H.: Divergence pattern of animal gene families and relationship with the cambrian explosion. BioEssays 23(11), 1018–1027 (2001)

    Article  CAS  PubMed  Google Scholar 

  21. Paysan-Lafosse, T., Blum, M., Chuguransky, S., et al.: Interpro in 2022. Nucleic Acids Res. 51(D1), D418–D427 (2023)

    Article  CAS  PubMed  Google Scholar 

  22. Phuong, T.M., Do, C.B., Edgar, R.C., Batzoglou, S.: Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34(20), 5932–5942 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981)

    Article  Google Scholar 

  25. Sato, P.M., Yoganathan, K., Jung, J.H., Peisajovich, S.G.: The robustness of a signaling complex to domain rearrangements facilitates network evolution. PLoS Biol. 12(12), e1002012 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  26. Schultz, J., Copley, R.R., Doerks, T., Ponting, C.P., Bork, P.: SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28(1), 231–234 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tordai, H., Nagy, A., Farkas, K., Banyai, L., Patthy, L.: Modules, multidomain proteins and organismic complexity. FEBS J. 272(19), 5064–5078 (2005)

    Article  CAS  PubMed  Google Scholar 

  29. Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C., Teichmann, S.A.: Structure, function and evolution of multidomain proteins. Curr. Opin. Struct. Biol. 14(2), 208–216 (2004)

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This work was supported in part by NSF award IIS 1553421 to MSB.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mukul S. Bansal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zaman, S., Bansal, M.S. (2023). Reducing the Impact of Domain Rearrangement on Sequence Alignment and Phylogeny Reconstruction. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7074-2_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7073-5

  • Online ISBN: 978-981-99-7074-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics