Abstract
Existing computational approaches for studying gene family evolution generally do not account for domain rearrangement within gene families. However, it is well known that protein domain architectures often differ between genes belonging to the same gene family. In particular, domain shuffling can lead to out-of-order domains which, unless explicitly accounted for, can significantly impact even the most fundamental of tasks such as multiple sequence alignment and phylogeny inference.
In this work, we make progress towards addressing this important but often overlooked problem. Specifically, we (i) demonstrate the impact of protein domain shuffling and rearrangement on multiple sequence alignment and gene tree reconstruction accuracy, (ii) propose two new computational methods for correcting gene sequences and alignments for improved gene tree reconstruction accuracy and evaluate them using realistically simulated datasets, and (iii) assess the potential impact of our new methods and of two existing approaches, MDAT and ProDA, in practice by applying them to biological gene families. We find that the methods work very well on simulated data but that performance of all methods is mixed, and often complementary, on real biological data, with different methods helping improve different subsets of gene families.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
ProDA and ProDA_50 could only be run successfully on 183 and 191 gene families, resp.
References
Krogh, A., Eddy, S., Durbin, R.M.: Biological Sequence Analysis. Cambridge University Press, Probabilistic Models of Proteins and Nucleic Acids (1998)
Baker, E.P., et al.: Evolution of host-microbe cell adherence by receptor domain shuffling. Elife 11 (2022)
Bansal, M.S., Kellis, M., Kordi, M., Kundu, S.: RANGER-DTL 2.0: rigorous reconstruction of gene-family evolution by duplication, transfer and loss. Bioinformatics 34(18), 3214–3216 (2018)
Björklund, A.K., Ekman, D., Light, S., Frey-Skött, J., Elofsson, A.: Domain rearrangements in protein evolution. J. Mol. Biol. 353(4), 911–923 (2005)
Blum, M., Chang, H.Y., Chuguransky, S., et al.: The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49(D1), D344–D354 (2020)
Choudhuri, S.: Chapter 2 - fundamentals of molecular evolution. In: Choudhuri, S. (ed.) Bioinformatics for Beginners, pp. 27–53. Academic Press, Oxford (2014)
Cohen-Gihon, I., Sharan, R., Nussinov, R.: Processes of fungal proteome evolution and gain of function: gene duplication and domain rearrangement. Phys. Biol. 8(3), 035009 (2011)
Dohmen, E., Klasberg, S., Bornberg-Bauer, E., Perrey, S., Kemena, C.: The modular nature of protein evolution: domain rearrangement rates across eukaryotic life. BMC Evol. Biol. 20(1), 30 (2020)
Edgar, R.C.: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004)
Ekman, D., Björklund, Å.K., Frey-Skött, J., Elofsson, A.: Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J. Mole. Biol. 348(1), 231–243 (2005)
Forslund, K., Sonnhammer, E.L.L.: Evolution of protein domain architectures. In: Anisimova, M. (ed.) Evolutionary Genomics: Statistical and Computational Methods, vol. 2, pp. 187–216. Humana Press, Totowa, NJ (2012)
Han, J.H., Batey, S., Nickson, A.A., Teichmann, S.A., Clarke, J.: The folding and evolution of multidomain proteins. Nat. Rev. Mol. Cell Biol. 8, 319–330 (2007)
Kemena, C., Bitard-Feildel, T., Bornberg-Bauer, E.: MDAT- aligning multiple domain arrangements. BMC Bioinform. 16, 19 (2015)
Kundu, S., Bansal, M.S.: SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution. Bioinformatics 35(18), 3496–3498 (2019)
Le, S.Q., Gascuel, O.: An improved general amino acid replacement matrix. Mol. Biol. Evol. 25(7), 1307–1320 (2008)
Li, L., Bansal, M.S.: An integrated reconciliation framework for domain, gene, and species level evolution. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(1), 63–76 (2019)
Marsh, J.A., Teichmann, S.A.: How do proteins gain new domains? Genome Biol. 11(7), 126 (2010)
Mi, H., Thomas, P.: PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol. Biol. 563, 123–140 (2009)
Mistry, J., et al.: Pfam: the protein families database in 2021. Nucleic Acids Res. 49(D1), D412–D419 (2021)
Miyata, T., Suga, H.: Divergence pattern of animal gene families and relationship with the cambrian explosion. BioEssays 23(11), 1018–1027 (2001)
Paysan-Lafosse, T., Blum, M., Chuguransky, S., et al.: Interpro in 2022. Nucleic Acids Res. 51(D1), D418–D427 (2023)
Phuong, T.M., Do, C.B., Edgar, R.C., Batzoglou, S.: Multiple alignment of protein sequences with repeats and rearrangements. Nucleic Acids Res. 34(20), 5932–5942 (2006)
Raphael, B., Zhi, D., Tang, H., Pevzner, P.: A novel method for multiple alignment of sequences with repeated and shuffled elements. Genome Res. 14(11), 2336–2346 (2004)
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1), 131–147 (1981)
Sato, P.M., Yoganathan, K., Jung, J.H., Peisajovich, S.G.: The robustness of a signaling complex to domain rearrangements facilitates network evolution. PLoS Biol. 12(12), e1002012 (2014)
Schultz, J., Copley, R.R., Doerks, T., Ponting, C.P., Bork, P.: SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 28(1), 231–234 (2000)
Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014)
Tordai, H., Nagy, A., Farkas, K., Banyai, L., Patthy, L.: Modules, multidomain proteins and organismic complexity. FEBS J. 272(19), 5064–5078 (2005)
Vogel, C., Bashton, M., Kerrison, N.D., Chothia, C., Teichmann, S.A.: Structure, function and evolution of multidomain proteins. Curr. Opin. Struct. Biol. 14(2), 208–216 (2004)
Funding
This work was supported in part by NSF award IIS 1553421 to MSB.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zaman, S., Bansal, M.S. (2023). Reducing the Impact of Domain Rearrangement on Sequence Alignment and Phylogeny Reconstruction. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_26
Download citation
DOI: https://doi.org/10.1007/978-981-99-7074-2_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7073-5
Online ISBN: 978-981-99-7074-2
eBook Packages: Computer ScienceComputer Science (R0)