Skip to main content

Tuning String Matching for Huge Pattern Sets

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Included in the following conference series:

Abstract

We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q-grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g. different trie implementations of the Aho-Corasick algorithm. Our algorithms showed to be substantially faster than earlier solutions for sets of 1,000–100,000 patterns. The gain is due to the improved filtering efficiency caused by q-grams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aho, M. Corasick: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18,6 (1975), 333–340.

    Article  MATH  MathSciNet  Google Scholar 

  2. R. Baeza-Yates. Improved string searching. Software — Practice and Experience, 19,3 (1989), 257–271.

    Article  MathSciNet  Google Scholar 

  3. R. Baeza-Yates, G. Gonnet: A new approach to text searching. Communications of ACM 35,10 (1992), 74–82.

    Article  Google Scholar 

  4. R. Boyer, S. Moore: A fast string searching algorithm. Communications of the ACM 20 (1977), 762–772.

    Article  Google Scholar 

  5. B. Commentz-Walter: A string matching algorithm fast on the average. Proc. 6th International Colloquium on Automata, Languages and Programming, Lecture Notes on Computer Science 71, 1979, 118–132.

    Google Scholar 

  6. M. Crochemore, W. Rytter: Text algorithms. Oxford University Press, 1994.

    Google Scholar 

  7. K. Fredriksson: Fast string matching with super-alphabet. Proc. SPIRE’ 02, String Processing and Information Retrieval, Lecture Notes in Computer Science 2476, 2002, 44–57.

    Chapter  Google Scholar 

  8. M. Fisk, G. Varghese: Fast content-based packet handling for intrusion detection. UCSD Technical Report CS2001-0670, 2001.

    Google Scholar 

  9. B. Gum, R. Lipton: Cheaper by the dozen: batched algorithms. Proc. First SIAM International Conference on Data Mining, 2001

    Google Scholar 

  10. N. Horspool: Practical fast searching in strings. Software — Practice and Experience 10 (1980), 501–506.

    Article  Google Scholar 

  11. R. Karp, M. Rabin: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31 (1987), 249–260.

    Article  MATH  MathSciNet  Google Scholar 

  12. R. Muth, U. Manber: Approximate multiple string search. Proc. CPM’ 96, Combinatorial Pattern Matching, Lecture Notes in Computer Science 1075, 1996, 75–86.

    Google Scholar 

  13. G. Navarro, M. Raffinot: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithms 5,4 (2000), 1–36.

    MathSciNet  Google Scholar 

  14. G. Navarro, M. Raffinot: Flexible pattern matching in strings. Cambridge University Press, 2002.

    Google Scholar 

  15. S. Wu, U. Manber: A fast algorithm for multi-pattern searching. Report TR-94-17, Department of Computer Science, University of Arizona, 1994.

    Google Scholar 

  16. S. Wu, U. Manber: Agrep — A fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 1992, 153–162.

    Google Scholar 

  17. R. Zhu, T. Takaoka: A technique for two-dimensional pattern matching. Communications of the ACM 32 (1989), 1110–1120.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kytöjoki, J., Salmela, L., Tarhio, J. (2003). Tuning String Matching for Huge Pattern Sets. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics