Skip to main content
Log in

Approximate Boyer-Moore String Matching for Small Alphabets

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Recently a new variation of approximate Boyer-Moore string matching was presented for the k-mismatch problem. The variation, called FAAST, is specifically tuned for small alphabets. We further improve this algorithm gaining speedups in both preprocessing and searching. We also present three variations of the algorithm for the k-difference problem. We show that the searching time of the algorithms is average-optimal and the preprocessing also has a lower time complexity than FAAST. Our experiments show that our algorithm for the k-mismatch problem is about 30% faster than FAAST and the new algorithms compare well against other state-of-the-art algorithms for approximate string matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arlazarov, V., Dinic, E., Kronrod, M., Faradzev, I.: On economic construction of the transitive closure of a directed graph. Dokl. Acad. Nauk SSSR 194, 487–488 (1970) (in Russian). English translation in Sov. Math. Dokl. 11, 1209–1210 (1975)

    MathSciNet  Google Scholar 

  2. Baeza-Yates, R., Gonnet, G.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  3. Baeza-Yates, R., Gonnet, G.: Fast string matching with mismatches. Inf. Comput. 108(2), 187–199 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  4. Baeza-Yates, R., Perleberg, C.: Fast and practical approximate string matching. Inf. Process. Lett. 59(1), 21–27 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  5. Boyer, R., Moore, J.: A fast string searching algorithm. Commun. ACM 10(20), 762–772 (1977)

    Article  Google Scholar 

  6. Chang, W., Marr, T.: Approximate string matching and local similarity. In: Proceedings of the 5th Symposium on Combinatorial Pattern Matching. LNCS, vol. 807, pp. 259–173. Springer, Berlin (1994)

    Google Scholar 

  7. El-Mabrouk, N., Crochemore, M.: Boyer-Moore strategy to efficient approximate string matching. In: Proceedings of 7th Symposium on Combinatorial Pattern Matching. LNCS, vol. 1075, pp. 24–38. Springer, Berlin (1996)

    Google Scholar 

  8. Fredriksson, K., Navarro, G.: Average-optimal single and multiple approximate string matching. ACM J. Exp. Algorithm. 9(4) (2004)

  9. Horspool, N.: Practical fast searching in strings. Softw. Pract. Experience 10, 501–506 (1980)

    Article  Google Scholar 

  10. Liu, Z., Chen, X., Borneman, J., Jiang, T.: A fast algorithm for approximate string matching on gene sequences. In: Proceedings of 16th Symposium on Combinatorial Pattern Matching. LNCS, vol. 3537, pp. 79–90. Springer, Berlin (2005)

    Google Scholar 

  11. Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. Syst. Sci. 20, 18–31 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  12. Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  13. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  14. Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Exp. Algorithm. 5(4) (2000)

  15. Navarro, G., Sutinen, E., Tanninen, J., Tarhio, J.: Indexing text with approximate q-grams. In: Proceedings of 11th Symposium on Combinatorial Pattern Matching. LNCS, vol. 1848, pp. 350–363. Springer, Berlin (2000)

    Chapter  Google Scholar 

  16. Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European molecular biology open software suite. Trends Genet. 16(6), 276–277 (2000)

    Article  Google Scholar 

  17. Tarhio, J., Ukkonen, E.: Approximate Boyer-Moore string matching. SIAM J. Comput. 22, 243–260 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  18. Wu, S., Manber, U., Myers, E.: A subquadratic algorithm for approximate limited expression matching. Algorithmica 15(1), 50–67 (1996)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leena Salmela.

Additional information

Work supported by Academy of Finland.

An earlier version of this paper appeared in Proceedings of String Processing and Information Retrieval (Oct. 29–31, 2007) pp. 173–183.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Salmela, L., Tarhio, J. & Kalsi, P. Approximate Boyer-Moore String Matching for Small Alphabets. Algorithmica 58, 591–609 (2010). https://doi.org/10.1007/s00453-009-9286-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-009-9286-3

Keywords

Navigation