Skip to main content

Two-Pass Greedy Regular Expression Parsing

  • Conference paper
Implementation and Application of Automata (CIAA 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7982))

Included in the following conference series:

Abstract

We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only k n bits of sequentially written and read log storage, where \(k < \frac{1}{3} m\) is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that our lean-log algorithm has also in practice superior performance and is surprisingly competitive with RE tools not performing full parsing, such as Grep.

This work has been partially supported by The Danish Council for Independent Research under Project 11-106278, “Kleene Meets Church: Regular Expressions and Types”. The order of authors is insignificant.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kearns, S.M.: Extending Regular Expressions. PhD thesis, Columbia University (1990)

    Google Scholar 

  2. Frisch, A., Cardelli, L.: Greedy Regular Expression Matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Dubé, D., Feeley, M.: Efficiently Building a Parse Tree From a Regular Expression. Acta Informatica 37(2), 121–144 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  4. Nielsen, L., Henglein, F.: Bit-coded Regular Expression Parsing. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 402–413. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Henglein, F., Nielsen, L.: Regular expression containment: Coinductive axiomatization and computational interpretation. In: Proc. 38th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL). SIGPLAN Notices, vol. 46, pp. 385–398. ACM Press (January 2011)

    Google Scholar 

  6. Cox, R.: RE2, https://code.google.com/p/re2/

  7. Ousterhout, J.: Tcl: An Embeddable Command Language. In: Proc. USENIX Winter Conference, pp. 133–146 (January 1990)

    Google Scholar 

  8. Wall, L., Christiansen, T., Orwant, J.: Programming Perl. O’Reilly Media, Incorporated (2000)

    Google Scholar 

  9. Veanes, M.V.M., de Halleux, P., Tillmann, N.: Rex: Symbolic Regular Expression Explorer. In: Proc. 3d Int’l Conf. on Software Testing, Verification and Validation, Paris, France. IEEE Computer Society Press (April 6-10 2010)

    Google Scholar 

  10. Cox, R.: Regular Expression Matching can be Simple and Fast

    Google Scholar 

  11. Earley, J.: An Efficient Context-Free Parsing Algorithm. Communications of the ACM 13(2), 94–102 (1970)

    Article  MATH  Google Scholar 

  12. Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: ACM SIGPLAN Notices, vol. 46, pp. 189–195. ACM (2011)

    Google Scholar 

  13. Fischer, S., Huch, F., Wilke, T.: A Play on Regular Expressions: Functional Pearl. In: Proc. of the 15th ACM SIGPLAN International Conference on Functional Programming, ICFP 2010, pp. 357–368. ACM, New York (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grathwohl, N.B.B., Henglein, F., Nielsen, L., Rasmussen, U.T. (2013). Two-Pass Greedy Regular Expression Parsing. In: Konstantinidis, S. (eds) Implementation and Application of Automata. CIAA 2013. Lecture Notes in Computer Science, vol 7982. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39274-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39274-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39273-3

  • Online ISBN: 978-3-642-39274-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics