Skip to main content

POSIX Regular Expression Parsing with Derivatives

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8475))

Abstract

We adapt the POSIX policy to the setting of regular expression parsing. POSIX favors longest left-most parse trees. Compared to other policies such as greedy left-most, the POSIX policy is more intuitive but much harder to implement. Almost all POSIX implementations are buggy as observed by Kuklewicz. We show how to obtain a POSIX algorithm for the general parsing problem based on Brzozowski’s regular expression derivatives. Correctness is fairly straightforward to establish and our benchmark results show that our approach is promising.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  2. Cox, R.: re2 – an efficient, principled regular expression library, http://code.google.com/p/re2/+

  3. Cox, R.: NFA POSIX (2007), http://swtch.com/~rsc/regexp/nfa-posix.y.txt

  4. Cox, R.: Regular expression matching: the virtual machine approach - digression: Posix submatching (2009), http://swtch.com/~rsc/regexp/regexp2.html

  5. Cox, R.: Regular expression matching in the wild (2010), http://swtch.com/~rsc/regexp/regexp3.html

  6. http://hackage.haskell.org/package/dequeue-0.1.5/docs/Data-Dequeue.html

  7. Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Inf. 37(2), 121–144 (2000)

    Article  MATH  Google Scholar 

  8. Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Grathwohl, N.B.B., Henglein, F., Nielsen, L., Rasmussen, U.T.: Two-pass greedy regular expression parsing. In: Konstantinidis, S. (ed.) CIAA 2013. LNCS, vol. 7982, pp. 60–71. Springer, Heidelberg (2013)

    Google Scholar 

  10. Institute of Electrical and Electronics Engineers (IEEE): Standard for information technology – Portable Operating System Interface (POSIX) – Part 2 (Shell and utilities), Section 2.8 (Regular expression notation). IEEE Standard 1003.2, New York (1992)

    Google Scholar 

  11. Kuklewicz, C.: Regex POSIX, http://www.haskell.org/haskellwiki/Regex_Posix

  12. Kuklewicz, C.: The regex-posix-unittest package, http://hackage.haskell.org/package/regex-posix-unittest

  13. Kuklewicz, C.: Forward regular expression matching with bounded space (2007), http://haskell.org/haskellwiki/RegexpDesign

  14. Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: SPIRE, pp. 181–187 (2000)

    Google Scholar 

  15. Lu, K.Z.M., Sulzmann, M.: POSIX Submatching with Regular Expression Derivatives, http://code.google.com/p/xhaskell-regex-deriv

  16. Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: Proc. of ICFP 2011, pp. 189–195. ACM (2011)

    Google Scholar 

  17. Nielsen, L., Henglein, F.: Bit-coded regular expression parsing. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 402–413. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  18. Okasaki, C.: Purely functional data structures. Cambridge University Press (1999)

    Google Scholar 

  19. Okui, S., Suzuki, T.: Disambiguation in regular expression matching via position automata with augmented transitions. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 231–240. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives reexamined. Journal of Functional Programming 19(2), 173–190 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  21. PCRE - Perl Compatible Regular Expressions, http://www.pcre.org/

  22. regex-posix: The posix regex backend for regex-base, http://hackage.haskell.org/package/regex-posix

  23. regex-tdfa: A new all haskell tagged dfa regex engine, inspired by libtre, http://hackage.haskell.org/package/regex-tdfa

  24. Sulzmann, M., Lu, K.Z.M.: Regular expression sub-matching using partial derivatives. In: Proc. of PPDP 2012, pp. 79–90. ACM (2012)

    Google Scholar 

  25. Vansummeren, S.: Type inference for unique pattern matching. ACM TOPLAS 28(3), 389–428 (2006)

    Article  Google Scholar 

  26. Vouillon, J.: ocaml-re - Pure OCaml regular expressions, with support for Perl and POSIX-style strings, https://github.com/avsm/ocaml-re

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sulzmann, M., Lu, K.Z.M. (2014). POSIX Regular Expression Parsing with Derivatives. In: Codish, M., Sumii, E. (eds) Functional and Logic Programming. FLOPS 2014. Lecture Notes in Computer Science, vol 8475. Springer, Cham. https://doi.org/10.1007/978-3-319-07151-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07151-0_13

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07150-3

  • Online ISBN: 978-3-319-07151-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics