Abstract
Genome rearrangement distance problems allow to estimate the evolutionary distance between genomes. These problems aim to compute the minimum number of mutations called rearrangement events necessary to transform one genome into another. Two commonly studied rearrangements are the reversal, which inverts a sequence of genes, and the transposition, which exchanges two consecutive sequences of genes. Seminal works on this topic focused on the sequence of genes and assumed that each gene occurs exactly once on each genome. More realistic models have been assuming that a gene may have multiple copies or may appear in only one of the genomes. Other models also take into account the nucleotides between consecutive pairs of genes, which are called intergenic regions. This work combines all these generalizations defining the signed intergenic reversal distance (SIRD), the signed intergenic reversal and transposition distance (SIRTD), the signed intergenic reversal and indels distance (SIRID), and the signed intergenic reversal, transposition, and indels distance (SIRTID) problems. We show a relation between these problems and the signed minimum common intergenic string partition (SMCISP) problem. From such relation, we derive \(\varTheta (k)\)-approximation algorithms for the SIRD and the SIRTD problems, where k is maximum number of copies of a gene in the genomes. These algorithms also work as heuristics for the SIRID and SIRTID problems. Additionally, we present some parametrized algorithms for SMCISP that ensure constant approximation factors for the distance problems. Our experimental tests on simulated genomes show an improvement on the rearrangement distances with the use of the partition algorithms.












Similar content being viewed by others
Data Availability
Enquiries about data availability should be directed to the authors.
References
Alexandrino AO, Brito KL, Oliveira AR, Dias U, Dias Z (2021a) Reversal distance on genomes with different gene content and intergenic regions information. In: Algorithms for computational biology, vol 12715. Springer, Berlin, pp 121–133
Alexandrino AO, Oliveira AR, Dias U, Dias Z (2021b) Genome rearrangement distance with reversals, transpositions, and indels. J Comput Biol 28(3):235–247
Alexandrino AO, Oliveira AR, Dias U, Dias Z (2021c) Incorporating intergenic regions into reversal and transposition distances with indels. J Bioinform Comput Biol 19(06):2140011
Biller P, Guéguen L, Knibbe C, Tannier E (2016a) Breaking good: accounting for fragility of genomic regions in rearrangement distance estimation. Genome Biol Evol 8(5):1427–1439
Biller P, Knibbe C, Beslon G, Tannier E (2016b) Comparative genomics on artificial life. In: Pursuit of the universal. Springer, Berlin, pp. 35–44
Brito KL, Jean G, Fertin G, Oliveira AR, Dias U, Dias Z (2020) Sorting by genome rearrangements on both gene order and intergenic sizes. J Comput Biol 27(2):156–174
Brito KL, Oliveira AR, Alexandrino AO, Dias U, Dias Z (2021) An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes. Algorithms Mol Biol 16(1):1–21
Bulteau L, Fertin G, Komusiewicz C, Rusu I (2013) A fixed-parameter algorithm for minimum common string partition with few duplications. In: Algorithms in bioinformatics. Springer, Berlin, pp 244–258
Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T (2005) Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans Comput Biol Bioinform 2(4):302–315
Cormode G, Muthukrishnan S (2007) The string edit distance matching problem with moves. ACM Trans Algorithms 3(1):1–19
Goldstein A, Kolman P, Zheng J (2005) Minimum common string partition problem: hardness and approximations. In: Fleischer R, Trippen G (eds) Proceedings of the 15th international symposium on algorithms and computation (ISAAC’2004). Springer, Berlin, pp 484–495
Kolman P, Waleń T (2007) Reversal distance for strings with duplicates: linear time approximation using hitting set. In: Erlebach T, Kaklamanis C (eds) Proceedings of the 4th international workshop on approximation and online algorithms (WAOA’2006). Springer, Berlin, pp 279–289
Oliveira AR, Brito KL, Dias U, Dias Z (2019) On the complexity of sorting by reversals and transpositions problems. J Comput Biol 26:1223–1229. https://doi.org/10.1089/cmb.2019.0078
Oliveira AR, Jean G, Fertin G, Brito KL, Bulteau L, Dias U, Dias Z (2021a) Sorting signed permutations by intergenic reversals. IEEE/ACM Trans Comput Biol Bioinform 18(6):2870–2876
Oliveira AR, Jean G, Fertin G, Brito KL, Dias U, Dias Z (2021b) Sorting permutations by intergenic operations. IEEE/ACM Trans Comput Biol Bioinform 18(6):2080–2093
Radcliffe AJ, Scott AD, Wilmer EL (2005) Reversals and transpositions over finite alphabets. SIAM J Discrete Math 19(1):224–244
Siqueira G, Alexandrino AO, Oliveira AR, Dias Z (2021a) Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions. Algorithms Mol Biol 16(1):1–23
Siqueira G, Brito KL, Dias U, Dias Z (2021b) Heuristics for genome rearrangement distance with replicated genes. IEEE/ACM Trans Comput Biol Bioinform 18(6):2094–2108
Walter MEMT, Dias Z, Meidanis J (1998) Reversal and transposition distance of linear chromosomes. In: Proceedings of the 5th international symposium on string processing and information retrieval (SPIRE’1998). IEEE Computer Society, Los Alamitos, pp 96–102
Willing E, Stoye J, Braga M (2021) Computing the inversion-indel distance. IEEE/ACM Trans Comput Biol Bioinform 18(6):2314–2326
Acknowledgements
This work was supported by the National Council of Technological and Scientific Development, CNPq (grant 202292/2020-7 ), the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, and the São Paulo Research Foundation, FAPESP (grants 2013/08293-7, 2015/11937-9, 2017/12646-3, and 2021/13824-8).
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this work appeared in thes Proceedings of the 14th International Conference on Bioinformatics and Computational Biology (BICoB 2022).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Siqueira, G., Alexandrino, A.O. & Dias, Z. Signed rearrangement distances considering repeated genes, intergenic regions, and indels. J Comb Optim 46, 16 (2023). https://doi.org/10.1007/s10878-023-01083-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10878-023-01083-w