An algorithm with linear expected running time for string editing with substitutions and substring reversals

https://doi.org/10.1016/j.ipl.2007.11.017Get rights and content

Abstract

The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y without performing multiple operations that involve the same position. Ordinarily, string editing is based on character insert, delete, and substitute operations. Motivated from the facts that substring reversals are observed in genomic sequences, and it is not always possible to transform a given sequence X into a given sequence Y by reversals alone (e.g., X is all 0's, and Y is all 1's), Muthukrishnan and Sahinalp [S. Muthukrishnan, S.C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations, in: Proc. ACM Symposium on Theory of Computing (STOC), 2000, pp. 416–424; S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95–101] considered a “simple” well-defined edit distance model in which the edit operations are: replace a character, and reverse and replace a substring. A substring of X can only be reversed if the reversal results in a match in the same position in Y. The cost of each character replacement and substring reversal is 1. The distance in this case is defined only when |X|=|Y|=n. There is an algorithm for computing the distance in this model with worst-case time complexity O(nlog2n) [S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95–101]. We present a dynamic programming algorithm with worst-case time complexity O(n2) but its expected running-time is O(n). In our dynamic programming solution the weights of edit operations can vary for different substitutions, and the costs of reversals can be a function of the reversal-length.

References (14)

There are more references available in the full text version of this article.

Cited by (0)

View full text