Abstract
Recently a new variation of approximate Boyer-Moore string matching was presented for the k-mismatch problem. The variation, called FAAST, is specifically tuned for small alphabets. We further improve this algorithm gaining speedups in both preprocessing and searching. We also present three variations of the algorithm for the k-difference problem. We show that the searching time of the algorithms is average-optimal and the preprocessing also has a lower time complexity than FAAST. Our experiments show that our algorithm for the k-mismatch problem is about 30% faster than FAAST and the new algorithms compare well against other state-of-the-art algorithms for approximate string matching.
Similar content being viewed by others
References
Arlazarov, V., Dinic, E., Kronrod, M., Faradzev, I.: On economic construction of the transitive closure of a directed graph. Dokl. Acad. Nauk SSSR 194, 487–488 (1970) (in Russian). English translation in Sov. Math. Dokl. 11, 1209–1210 (1975)
Baeza-Yates, R., Gonnet, G.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)
Baeza-Yates, R., Gonnet, G.: Fast string matching with mismatches. Inf. Comput. 108(2), 187–199 (1994)
Baeza-Yates, R., Perleberg, C.: Fast and practical approximate string matching. Inf. Process. Lett. 59(1), 21–27 (1996)
Boyer, R., Moore, J.: A fast string searching algorithm. Commun. ACM 10(20), 762–772 (1977)
Chang, W., Marr, T.: Approximate string matching and local similarity. In: Proceedings of the 5th Symposium on Combinatorial Pattern Matching. LNCS, vol. 807, pp. 259–173. Springer, Berlin (1994)
El-Mabrouk, N., Crochemore, M.: Boyer-Moore strategy to efficient approximate string matching. In: Proceedings of 7th Symposium on Combinatorial Pattern Matching. LNCS, vol. 1075, pp. 24–38. Springer, Berlin (1996)
Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM J. Exp. Algorithm. 9(4) (2004)
Horspool, N.: Practical fast searching in strings. Softw. Pract. Experience 10, 501–506 (1980)
Liu, Z., Chen, X., Borneman, J., Jiang, T.: A fast algorithm for approximate string matching on gene sequences. In: Proceedings of 16th Symposium on Combinatorial Pattern Matching. LNCS, vol. 3537, pp. 79–90. Springer, Berlin (2005)
Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. Syst. Sci. 20, 18–31 (1980)
Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exp. Algorithm. 5(4) (2000)
Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Proceedings of 11th Symposium on Combinatorial Pattern Matching. LNCS, vol. 1848, pp. 350–363. Springer, Berlin (2000)
Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European molecular biology open software suite. Trends Genet. 16(6), 276–277 (2000)
Tarhio, J., Ukkonen, E.: Approximate Boyer-Moore string matching. SIAM J. Comput. 22, 243–260 (1993)
Wu, S., Manber, U., Myers, E.: A subquadratic algorithm for approximate limited expression matching. Algorithmica 15(1), 50–67 (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
Work supported by Academy of Finland.
An earlier version of this paper appeared in Proceedings of String Processing and Information Retrieval (Oct. 29–31, 2007) pp. 173–183.
Rights and permissions
About this article
Cite this article
Salmela, L., Tarhio, J. & Kalsi, P. Approximate Boyer-Moore String Matching for Small Alphabets. Algorithmica 58, 591–609 (2010). https://doi.org/10.1007/s00453-009-9286-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-009-9286-3