Skip to main content

Optimally Streaming Greedy Regular Expression Parsing

  • Conference paper
Theoretical Aspects of Computing – ICTAC 2014 (ICTAC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8687))

Included in the following conference series:

Abstract

We study the problem of streaming regular expression parsing: Given a regular expression and an input stream of symbols, how to output a serialized syntax tree representation as an output stream during input stream processing.

We show that optimally streaming regular expression parsing, outputting bits of the output as early as is semantically possible for any regular expression of size m and any input string of length n, can be performed in time O(2m logm + m n) on a unit-cost random-access machine. This is for the wide-spread greedy disambiguation strategy for choosing parse trees of grammatically ambiguous regular expressions. In particular, for a fixed regular expression, the algorithm’s run-time scales linearly with the input string length. The exponential is due to the need for preprocessing the regular expression to analyze state coverage of its associated NFA, a PSPACE-hard problem, and tabulating all reachable ordered sets of NFA-states.

Previous regular expression parsing algorithms operate in multiple phases, always requiring processing or storing the whole input string before outputting the first bit of output, not only for those regular expressions and input prefixes where reading to the end of the input is strictly necessary.

This work has been partially supported by The Danish Council for Independent Research under Project 11-106278, “Kleene Meets Church: Regular Expressions and Types”. The order of authors is insignificant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and computation 140(2), 229–253 (1998), http://dx.doi.org/10.1006/inco.1997.2688

    Article  MathSciNet  MATH  Google Scholar 

  2. Debarbieux, D., Gauwin, O., Niehren, J., Sebastian, T., Zergaoui, M.: Early nested word automata for XPath query answering on XML streams. In: Konstantinidis, S. (ed.) CIAA 2013. LNCS, vol. 7982, pp. 292–305. Springer, Heidelberg (2013)

    Google Scholar 

  3. Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Informatica 37(2), 121–144 (2000), http://dx.doi.org/10.1007/s002360000037

    Article  MathSciNet  MATH  Google Scholar 

  4. Fowler, G.: An interpretation of the POSIX regex standard (January 2003), http://www2.research.att.com/~astopen/testregex/re-interpretation.html

  5. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004), http://dx.doi.org/10.1007/978-3-540-27836-8_53

    Chapter  Google Scholar 

  6. Grathwohl, N.B.B., Henglein, F., Nielsen, L., Rasmussen, U.T.: Two-pass greedy regular expression parsing. In: Konstantinidis, S. (ed.) CIAA 2013. LNCS, vol. 7982, pp. 60–71. Springer, Heidelberg (2013), http://dx.doi.org/10.1007/978-3-642-39274-0_7

    Google Scholar 

  7. Gupta, A.K., Suciu, D.: Stream processing of XPath queries with predicates. In: Proc. 2003 ACM SIGMOD International Conference on Management of Data, SIGMOD 2003, pp. 419–430. ACM, New York (2003), http://dx.doi.org/10.1145/872757.872809

    Google Scholar 

  8. IEEE Computer Society: Standard for Information Technology - Portable Operating System Interface (POSIX), Base Specifications, Issue 7. IEEE (2008), http://dx.doi.org/10.1109/IEEESTD.2008.4694976 , IEEE Std 1003.1

  9. Kearns, S.: Extending regular expressions with context operators and parse extraction. Software - Practice and Experience 21(8), 787–804 (1991), http://dx.doi.org/10.1002/spe.4380210803

    Article  Google Scholar 

  10. Nielsen, L., Henglein, F.: Bit-coded regular expression parsing. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 402–413. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-21254-3_32

    Chapter  Google Scholar 

  11. Okui, S., Suzuki, T.: Disambiguation in regular expression matching via position automata with augmented transitions. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 231–240. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-18098-9_25

    Chapter  Google Scholar 

  12. Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time (preliminary report). In: Proc. Fifth Annual ACM Symposium on Theory of Computing, pp. 1–9. ACM (1973)

    Google Scholar 

  13. Sulzmann, M., Lu, K.Z.M.: Regular expression sub-matching using partial derivatives. In: Proc. 14th Symposium on Principles and Practice of Declarative Programming, PPDP 2012, pp. 79–90. ACM, New York (2012), http://dx.doi.org/10.1145/2370776.2370788

    Google Scholar 

  14. Sulzmann, M., Lu, K.Z.M.: POSIX regular expression parsing with derivatives. In: Codish, M., Sumii, E. (eds.) FLOPS 2014. LNCS, vol. 8475, pp. 203–220. Springer, Heidelberg (2014), http://dx.doi.org/10.1007/978-3-319-07151-0_13

    Chapter  Google Scholar 

  15. Thompson, K.: Programming techniques: Regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968), http://dx.doi.org/10.1145/363347.363387

    Article  MATH  Google Scholar 

  16. Wall, L., Christiansen, T., Orwant, J.: Programming Perl. O’Reilly Media, Incorporated (2000)

    Google Scholar 

  17. Wu, X., Theodoratos, D.: A survey on XML streaming evaluation techniques. The VLDB Journal 22(2), 177–202 (2013), http://dx.doi.org/10.1007/s00778-012-0281-y

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Grathwohl, N.B.B., Henglein, F., Rasmussen, U.T. (2014). Optimally Streaming Greedy Regular Expression Parsing. In: Ciobanu, G., Méry, D. (eds) Theoretical Aspects of Computing – ICTAC 2014. ICTAC 2014. Lecture Notes in Computer Science, vol 8687. Springer, Cham. https://doi.org/10.1007/978-3-319-10882-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10882-7_14

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10881-0

  • Online ISBN: 978-3-319-10882-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics