Abstract
Given two strings, X and Y, both of length O(n) over alphabet Σ, a basic problem (local alignment) is to find pairs of similar substrings, one from X and one from Y. For substrings X′ and Y′ from X and Y, respectively, the metric we use to measure their similarity is normalized alignment value: LCS(X′,Y′)/(|X′|+|Y′|). Given an integer M we consider only those substrings whose LCS length is at least M. We present an algorithm that reports the pairs of substrings with the highest normalized alignment value in O(nlog|Σ| + rMloglogn) time (r– the number of matches between X and Y). We also present an O(nlog|Σ| + rLloglogn) algorithm (L = LCS(X,Y)) that reports all substring pairs with a normalized alignment value above a given threshold.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A.: String editing and longest common subsequence. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 2, pp. 361–398. Springer, Berlin (1997)
Apostolico, A., Galil, Z.: Pattern matching algorithms. Oxford University Press, Oxford (1997)
Apostolico, A., Guerra, C.: The Longest Common Subsequence Problem Revisited. Algorithmica 2, 315–336 (1987)
Arslan, A.N., E˘gecio˘glu, O., Pevzner, P.A.: A new approach to sequence comparison: normalized sequence alignment. Bioinformatics 17(4), 327–337 (2001)
Claus, R.: Efficient computation of all longest common subsequences. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 407–418. Springer, Heidelberg (2000)
Crochemore, M., Rytter, W.: Text Algorithms. Oxford University Press, Oxford (1994)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)
Eppstein, D., Galil, Z., Giancarlo, R., Italiano, G.F.: Sparse Dynamic Programming I: Linear Cost Functions. JACM 39, 546–567 (1992)
Gusfield, D.: Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge (1997)
Hirschberg, D.S.: Algorithms for the longest common subsequence problem. JACM 24(4), 664–675 (1977)
Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequence. Communications of the ACM 20, 350–353 (1977)
Johnson, D.B.: A priority queue in which initialization and queue operations take O(loglog D) time. Math. Syst. Theory 15, 295–309 (1982)
Levenshtein, V.I.: Binary codes capable of correcting, deletions, insertions and reversals. Soviet Phys. Dokl 10, 707–710 (1966)
Myers, E.W.: Incremental Alignment Algorithms and their Applications. Tech. Rep. 86-22, Dept. of Computer Science, U. of Arizona (1986)
Navarro, G., Raffinot, M.: Flexible pattern matching in strings practical on-line search algorithms for text and biological sequences. Cambridge University Press, Cambridge (2002)
Smith, T., Waterman, M.S.: The identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Ukkonen, E.: On-line construction of suffix trees. Technical Report No A-1993- 1, Department of Computer Science, University of Helsinki (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Efraty, N., Landau, G.M. (2004). Sparse Normalized Local Alignment. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive