Abstract
A pairwise sequence alignment is a structure describing a set of editing operations that transforms one given sequence into another given sequence. We consider insertion, deletion, and substitution of symbols as editing operations. Given a fixed function assigning a weight for each editing operation, the weight of an alignment A is the sum of the editing operations described by A. Needleman and Wunsch proposed an algorithm for finding a pairwise sequence alignment of minimum editing weight. However, a sequence of editing operations that transforms one sequence into another cannot always be represented by an alignment. We present a more general structure that allows us to represent any sequence of editing operations that transforms one sequence into another. We also show how to find a minimum weight sequence of editing operations to transform one sequence into another in quadratic time, even if they cannot be represented by an alignment. Additionally, we show that there exists no algorithm to solve the problem with subquadratic running time, unless SETH is false. This approach may be used to explain non-trivial evolutionary models in Molecular Biology, where the triangle inequality does not hold for the distance between the sequences, such as those involving adaptive and back mutations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Araujo, E., Martinez, F.V., Higa, C.H.A., Soares, J.: Matrices inducing generalized metric on sequences. Discrete Appl. Math. (2023, to appear)
Araujo, E., Rozante, L.C., Rubert, D.P., Martinez, F.V.: Algorithms for normalized multiple sequence alignments. In: Proceedings of ISAAC. LIPIcs, vol. 212, pp. 40:1–40:16 (2021)
Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of STOC, pp. 51–58 (2015)
Barton, C., Flouri, T., Iliopoulos, C.S., Pissis, S.P.: Global and local sequence alignment with a bounded number of gaps. Theor. Comput. Sci. 582, 1–16 (2015)
Chaurasiya, R.K., Londhe, N.D., Ghosh, S.: A novel weighted edit distance-based spelling correction approach for improving the reliability of Devanagari script-based P300 speller system. IEEE Access 4, 8184–8198 (2016)
Chenna, R., et al.: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)
Fisman, D., Grogin, J., Margalit, O., Weiss, G.: The Normalized Edit Distance with Uniform Operation Costs is a Metric. arXiv:2201.06115 (2022)
Floyd, R.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
Foster, P.: Adaptive mutation in Escherichia coli. J. Bacteriol. 186(15), 4846–4852 (2004)
de la Higuera, C., Micó, L.: A contextual normalised edit distance. In: Proceedings of ICDEW, pp. 354–361. IEEE (2008)
Karplus, K., Barrett, C., Hughey, R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10), 846–856 (1998)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Doklady 10(8), 707–710 (1966)
Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. PNAS 86(12), 4412–4415 (1989)
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)
Ichinose, M., Iizuka, M., Kusumi, J., Takefu, M.: Models of compensatory molecular evolution: effects of back mutation. J. Theor. Biol. 323(0), 1–10 (2013)
Marzal, A., Vidal, E.: Computation of normalized edit distance and applications. IEEE T. Pattern Anal. 15(9), 926–932 (1993)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)
Rosenberg, S.: Evolving responsively: adaptive mutation. Nat. Rev. Genet. 2, 504–515 (2001)
Setubal, J.C., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Pub. (1997)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: Proceedings of ICDAR, pp. 1557–1562. IEEE (2019)
Warshall, S.: A theorem on Boolean matrices. J. ACM 9(1), 11–12 (1962)
Yujian, L., Bo, L.: A normalized Levenshtein distance metric. IEEE T. Pattern Anal. 29(6), 1091–1095 (2007)
Acknowledgments
The authors thank José Augusto Ramos Soares, Said Sadique Adi, and Vagner Pedrotti for valuable discussions on this topic.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Araujo, E., Martinez, F.V., Rozante, L.C., Almeida, N.F. (2023). Extended Pairwise Sequence Alignment. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-36805-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36804-2
Online ISBN: 978-3-031-36805-9
eBook Packages: Computer ScienceComputer Science (R0)