Skip to main content

An Efficient Linear Pseudo-minimization Algorithm for Aho-Corasick Automata

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7354))

Abstract

A classical construction of Aho and Corasick solves the pattern matching problem for a finite set of words X in linear time, where the size of the input X is the sum of the lengths of its elements. It produces an automaton that recognizes A * X, where A is a finite alphabet, but which is generally not minimal. As an alternative to classical minimization algorithms, which yields a \({\mathcal O}(n\log n)\) solution to the problem, we propose a linear pseudo-minimization algorithm specific to Aho-Corasick automata, which produces an automaton whose size is between the size of the input automaton and the one of its associated minimal automaton. Moreover this algorithm generically computes the minimal automaton: for a large variety of natural distributions the probability that the output is the minimal automaton of A * X tends to one as the size of X tends to infinity.

This work was completed with the support of the ANR project MAGNUM number 2010-BLAN-0204.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. AitMous, O., Bassino, F., Nicaud, C.: Building the Minimal Automaton of A * X in Linear Time, When X Is of Bounded Cardinality. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 275–287. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Baker, T.P.: A technique for extending rapid exact-match string matching to arrays of more than one dimension. SIAM J. Comput., 533–541 (1978)

    Google Scholar 

  4. Bassino, F., Giambruno, L., Nicaud, C.: The average state complexity of rational operations on finite languages. Int. J. Found. Comput. Sci. 21(4), 495–516 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bird, R.S.: Two dimensional pattern matching. Inf. Process. Lett. 6(5), 168–170 (1977)

    Article  Google Scholar 

  6. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on strings. Cambridge University Press (2007)

    Google Scholar 

  7. Crochemore, M., Rytter, W.: Text Algorithms. Oxford Univ. Press (1994)

    Google Scholar 

  8. Hopcroft, J.E.: An n logn algorithm for minimizing states in a finite automaton. In: Theory of Machines and Computations, pp. 189–196. Academic Press (1971)

    Google Scholar 

  9. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley (1979)

    Google Scholar 

  10. Lothaire, M.: Applied Combinatorics on Words. Cambridge University Press (2005)

    Google Scholar 

  11. Revuz, D.: Dictionnaires et lexiques: methodes et algorithmes. PhD thesis, Institut Blaise Pascal (1991)

    Google Scholar 

  12. Revuz, D.: Minimisation of acyclic deterministic automata in linear time. Theoret. Comput. Sci. 92(1), 181–189 (1992)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

AitMous, O., Bassino, F., Nicaud, C. (2012). An Efficient Linear Pseudo-minimization Algorithm for Aho-Corasick Automata. In: Kärkkäinen, J., Stoye, J. (eds) Combinatorial Pattern Matching. CPM 2012. Lecture Notes in Computer Science, vol 7354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31265-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31265-6_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31264-9

  • Online ISBN: 978-3-642-31265-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics