Abstract
Using bit-parallelism has resulted in fast and practical algorithms for approximate string matching under the Levenshtein edit distance, which permits a single edit operation to insert, delete or substitute a character. Depending on the parameters of the search, currently the fastest non-filtering algorithms in practice are the O(kn ⌈m/w ⌉) algorithm of Wu & Manber, the O(⌈km/w ⌉n) algorithm of Baeza-Yates & Navarro, and the O(⌈m/w ⌉n) algorithm of Myers, where m is the pattern length, n is the text length, k is the error threshold and w is the computer word size. In this paper we discuss a uniform way of modifying each of these algorithms to permit also a fourth type of edit operation: transposing two adjacent characters in the pattern. This type of edit distance is also known as the Damerau edit distance. In the end we also present an experimental comparison of the resulting algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Navarro, G.: Faster approximate string matching. Algorithmica 23(2), 127–158 (1999)
Damerau, F.: A technique for computer detection and correction of spelling errors. Comm. of the ACM 7(3), 171–176 (1964)
Du, M.W., Chang, S.C.: A model and a fast algorithm for multiple errors spelling correction. Acta Informatica 29, 281–302 (1992)
Harman, D.: Overview of the Third Text REtrieval Conference. In: Proc. Third Text REtrieval Conference (TREC-3), pp. 1–19. NIST Special Publication 500-207 (1995)
Kukich, K.: Techniques for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966); Original in Russian in Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic progamming. Journal of the ACM 46(3), 395–415 (1999)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Navarro, G.: NR-grep: a fast and flexible pattern matching tool. Software Practice. Software Practice and Experience (SPE) 31, 1265–1312 (2001)
Navarro, G., Baeza-Yates, R.: Improving an algorithm for approximate pattern matching. Algorithmica 30(4), 473–502 (2001)
Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms 1, 359–373 (1980)
Ukkonen, E.: Algorithms for approximate string matching. Information and Control 64, 100–118 (1985)
Ukkonen, E.: Finding approximate patterns in strings. J. of Algorithms 6, 132–137 (1985)
Wright, A.: Approximate string matching using within-word parallelism. Software Practice and Experience 24(4), 337–362 (1994)
Wu, S., Manber, U.: Fast text searching allowing errors. Comm. of the ACM 35(10), 83–91 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hyyrö, H. (2003). Bit-Parallel Approximate String Matching Algorithms with Transposition. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2003. Lecture Notes in Computer Science, vol 2857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39984-1_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-39984-1_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20177-9
Online ISBN: 978-3-540-39984-1
eBook Packages: Springer Book Archive