Skip to main content

Online Matching of Multiple Regular Patterns with Gaps and Character Classes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7810))

Abstract

Given a dictionary D of regular expressions and a text T, the online regular-pattern-matching problem is to single out, for each text position T[c], those expressions in D that have a match ending at T[c], while processing T only once. This problem is considered in the context of regular patterns over bounded-length gaps and keywords, where the gaps are specified by wildcards and character classes and the keywords are strings over the input alphabet. Our algorithm is based on constructing the Aho–Corasick pattern-matching automaton for the set of keywords, and representing as a bit vector the set of keywords that can precede a given keyword in a regular-pattern instance. For a dictionary D with r patterns and with k i keywords in pattern i, the preprocessing takes time \(O(|D| + \sum_{i=1}^r k_i^2 \log k_i / w)\), where w denotes the number of bits in a memory word. When only fixed-length wildcard gaps without character classes are allowed, the time spent by our matching algorithm for each text character T[c] is at most O((logr + k/w) (K c  + 1)), where k =  max {k 1, …, k r } and K c is the number of keyword occurrences in D matched at text position T[c].

This work was supported by the Academy of Finland.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bille, P.: New algorithms for regular expression matching. In: Proc. of the 33rd Internat. Colloq. Automata, Languages and Programming, pp. 643–654 (2006)

    Google Scholar 

  3. Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theor. Comput. Sci. 409(3), 486–496 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable length gaps. Theor. Comput. Sci. 443, 25–34 (2012)

    Article  MATH  Google Scholar 

  5. Bille, P., Thorup, M.: Faster regular expression matching. In: Proc. of the 36th Internat. Colloq. Automata, Languages and Programming, pp. 171–182 (2009)

    Google Scholar 

  6. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. of the 21st Annual ACM-SIAM Symp. on Discrete Algorithms, pp. 1297–1308 (2010)

    Google Scholar 

  7. Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online Dictionary Matching with Variable-Length Gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Myers, E.W.: A four russians algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)

    Article  MATH  Google Scholar 

  9. Schnitger, G.: Regular expressions and NFAs without ε-transitions. In: Proc. of the 23rd Annual Symp. on Theoretical Aspects of Computer Science, pp. 432–443 (2006)

    Google Scholar 

  10. Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proc. of the 13th Internat. Conf. on World Wide Web, pp. 512–521 (2004)

    Google Scholar 

  11. Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sippu, S., Soisalon-Soininen, E. (2013). Online Matching of Multiple Regular Patterns with Gaps and Character Classes. In: Dediu, AH., Martín-Vide, C., Truthe, B. (eds) Language and Automata Theory and Applications. LATA 2013. Lecture Notes in Computer Science, vol 7810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37064-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37064-9_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37063-2

  • Online ISBN: 978-3-642-37064-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics