Regular Article
How Hard Is Computing the Edit Distance?

https://doi.org/10.1006/inco.2000.2914Get rights and content
Under an Elsevier user license
open archive

Abstract

The notion of edit distance arises in very different fields such as self-correcting codes, parsing theory, speech recognition, and molecular biology. The edit distance between an input string and a language L is the minimum cost of a sequence of edit operations (substitution of a symbol in another incorrect symbol, insertion of an extraneous symbol, deletion of a symbol) needed to change the input string into a sentence of L. In this paper we study the complexity of computing the edit distance, discovering sharp boundaries between classes of languages for which this function can be efficiently evaluated and classes of languages for which it seems to be difficult to compute. Our main result is a parallel algorithm for computing the edit distance for the class of languages accepted by one-way nondeterministic auxiliary pushdown automata working in polynomial time, a class that strictly contains context–free languages. Moreover, we show that this algorithm can be extended in order to find a sentence of the language from which the input string has minimum distance.

Abbreviations

formal languages

Abbreviations

computational complexity

Abbreviations

string correction

Abbreviations

error correction

Abbreviations

edit distance

Abbreviations

dynamic programming

Cited by (0)

Partially supported by Ministero dell'Università e della Ricerca Scientifica e Tecnologica, under the project ``Modelli di calcolo innovativi: metodi sintattici e combinatori." A preliminary version of this work appeared in Fundamentals of Computation Theory (FCT'95), Proceedings, Lecture Notes in Computer Science, Vol. 965, pp. 383--392, Springer-Verlag, Berlin/NewYork, 1995.

f1

[email protected]