Abstract
The Aho-Corasick algorithm is a classic method for matching a set of strings. However, the huge memory usage of Aho-Corasick automaton prevents it from being applied to large-scale pattern sets. Here we present a simple but efficient table compression method to reduce the automaton’s space. The basic idea of our method is based on equivalent rows elimination, which groups state rows into equivalent classes and eliminates the duplicates. Experiments demonstrate that the proposed method significantly reduces the memory usage and still runs at linear searching time comparable to that of extended Aho-Corasick algorithm. Our method provides good trade-off between memory usage and searching time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient String Matching: An Aid to Bibliographic Search. Communication of the ACM 18(6), 333–340 (1975)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences, p. 54. Cambridge University Press, Cambridge (2002)
Dencker, P., Dorre, K., Heuft, J.: Optimization of Parser Tables for Portable Compilers. ACM Transactions on Programming Languages and Systems 6(4), 546–572 (1984)
Tarjan, R.E., Yao, A.C.: Storing a Sparse Table. Communications of the ACM 22(11), 606–611 (1979)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools, p. 145. Addison-Wesley Publishing Co., Reading (1986)
Aoe, J., Morimoto, K., Sato, T.: An Efficient Implementation of Trie Structures. Software - Practice and Experience 22(9), 695–721 (1992)
Kiraz, G.A.: Compressed Storage of Sparse Finite-State Transducers. In: 4th International Workshop on Automata Implementation, pp. 109–121 (1999)
Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM 31(3), 538–544 (1984)
Galli, N., Seybold, B., Simon, K.: Tetris-hashing or Optimal Table Compression. Discrete Applied Mathematics 110(1), 41–58 (2001)
Andersson, A., Nilsson, S.: Improved Behavior of Tries by Adaptive Branching. Information Processing Letters 46(6), 295–300 (1993)
Aoe, J.: An Efficient Implementation of Static String Pattern Matching Machines. IEEE Transactions on Software Engineering 15(8), 1010–1016 (1989)
Norton, M.: Optimizing Pattern Matching for Intrusion Detection (2004), http://www.idsresearch.org
Tuck, N., Sherwood, T., Calder, B., Varghese, G.: Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection. In: IEEE INFOCOM (2004)
Nieminen, J., Kilpel, P.: Efficient Implementation of Aho-Corasick Pattern Matching Automata Using Unicode. Software - Practice and Experience 37(6), 669–690 (2007)
Hopcroft, J.E.: An n logn Algorithm for Minimizing States in a Finite Automaton. Technical Report: CS-TR-71-190, Stanford University, Stanford, CA, USA (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, Y., Yang, Y., Liu, P., Tan, J. (2009). A Table Compression Method for Extended Aho-Corasick Automaton. In: Maneth, S. (eds) Implementation and Application of Automata. CIAA 2009. Lecture Notes in Computer Science, vol 5642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02979-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-02979-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02978-3
Online ISBN: 978-3-642-02979-0
eBook Packages: Computer ScienceComputer Science (R0)