Abstract
We present three algorithms for exact string matching of multiple patterns. Our algorithms are filtering methods, which apply q-grams and bit parallelism. We ran extensive experiments with them and compared them with various versions of earlier algorithms, e.g. different trie implementations of the Aho-Corasick algorithm. Our algorithms showed to be substantially faster than earlier solutions for sets of 1,000–100,000 patterns. The gain is due to the improved filtering efficiency caused by q-grams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Aho, M. Corasick: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18,6 (1975), 333–340.
R. Baeza-Yates. Improved string searching. Software — Practice and Experience, 19,3 (1989), 257–271.
R. Baeza-Yates, G. Gonnet: A new approach to text searching. Communications of ACM 35,10 (1992), 74–82.
R. Boyer, S. Moore: A fast string searching algorithm. Communications of the ACM 20 (1977), 762–772.
B. Commentz-Walter: A string matching algorithm fast on the average. Proc. 6th International Colloquium on Automata, Languages and Programming, Lecture Notes on Computer Science 71, 1979, 118–132.
M. Crochemore, W. Rytter: Text algorithms. Oxford University Press, 1994.
K. Fredriksson: Fast string matching with super-alphabet. Proc. SPIRE’ 02, String Processing and Information Retrieval, Lecture Notes in Computer Science 2476, 2002, 44–57.
M. Fisk, G. Varghese: Fast content-based packet handling for intrusion detection. UCSD Technical Report CS2001-0670, 2001.
B. Gum, R. Lipton: Cheaper by the dozen: batched algorithms. Proc. First SIAM International Conference on Data Mining, 2001
N. Horspool: Practical fast searching in strings. Software — Practice and Experience 10 (1980), 501–506.
R. Karp, M. Rabin: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31 (1987), 249–260.
R. Muth, U. Manber: Approximate multiple string search. Proc. CPM’ 96, Combinatorial Pattern Matching, Lecture Notes in Computer Science 1075, 1996, 75–86.
G. Navarro, M. Raffinot: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithms 5,4 (2000), 1–36.
G. Navarro, M. Raffinot: Flexible pattern matching in strings. Cambridge University Press, 2002.
S. Wu, U. Manber: A fast algorithm for multi-pattern searching. Report TR-94-17, Department of Computer Science, University of Arizona, 1994.
S. Wu, U. Manber: Agrep — A fast approximate pattern-matching tool. Proc. Usenix Winter 1992 Technical Conference, 1992, 153–162.
R. Zhu, T. Takaoka: A technique for two-dimensional pattern matching. Communications of the ACM 32 (1989), 1110–1120.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kytöjoki, J., Salmela, L., Tarhio, J. (2003). Tuning String Matching for Huge Pattern Sets. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_16
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive