Abstract
We adapt the POSIX policy to the setting of regular expression parsing. POSIX favors longest left-most parse trees. Compared to other policies such as greedy left-most, the POSIX policy is more intuitive but much harder to implement. Almost all POSIX implementations are buggy as observed by Kuklewicz. We show how to obtain a POSIX algorithm for the general parsing problem based on Brzozowski’s regular expression derivatives. Correctness is fairly straightforward to establish and our benchmark results show that our approach is promising.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11(4), 481–494 (1964)
Cox, R.: re2 – an efficient, principled regular expression library, http://code.google.com/p/re2/+
Cox, R.: NFA POSIX (2007), http://swtch.com/~rsc/regexp/nfa-posix.y.txt
Cox, R.: Regular expression matching: the virtual machine approach - digression: Posix submatching (2009), http://swtch.com/~rsc/regexp/regexp2.html
Cox, R.: Regular expression matching in the wild (2010), http://swtch.com/~rsc/regexp/regexp3.html
http://hackage.haskell.org/package/dequeue-0.1.5/docs/Data-Dequeue.html
Dubé, D., Feeley, M.: Efficiently building a parse tree from a regular expression. Acta Inf. 37(2), 121–144 (2000)
Frisch, A., Cardelli, L.: Greedy regular expression matching. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 618–629. Springer, Heidelberg (2004)
Grathwohl, N.B.B., Henglein, F., Nielsen, L., Rasmussen, U.T.: Two-pass greedy regular expression parsing. In: Konstantinidis, S. (ed.) CIAA 2013. LNCS, vol. 7982, pp. 60–71. Springer, Heidelberg (2013)
Institute of Electrical and Electronics Engineers (IEEE): Standard for information technology – Portable Operating System Interface (POSIX) – Part 2 (Shell and utilities), Section 2.8 (Regular expression notation). IEEE Standard 1003.2, New York (1992)
Kuklewicz, C.: Regex POSIX, http://www.haskell.org/haskellwiki/Regex_Posix
Kuklewicz, C.: The regex-posix-unittest package, http://hackage.haskell.org/package/regex-posix-unittest
Kuklewicz, C.: Forward regular expression matching with bounded space (2007), http://haskell.org/haskellwiki/RegexpDesign
Laurikari, V.: NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In: SPIRE, pp. 181–187 (2000)
Lu, K.Z.M., Sulzmann, M.: POSIX Submatching with Regular Expression Derivatives, http://code.google.com/p/xhaskell-regex-deriv
Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: Proc. of ICFP 2011, pp. 189–195. ACM (2011)
Nielsen, L., Henglein, F.: Bit-coded regular expression parsing. In: Dediu, A.-H., Inenaga, S., Martín-Vide, C. (eds.) LATA 2011. LNCS, vol. 6638, pp. 402–413. Springer, Heidelberg (2011)
Okasaki, C.: Purely functional data structures. Cambridge University Press (1999)
Okui, S., Suzuki, T.: Disambiguation in regular expression matching via position automata with augmented transitions. In: Domaratzki, M., Salomaa, K. (eds.) CIAA 2010. LNCS, vol. 6482, pp. 231–240. Springer, Heidelberg (2011)
Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives reexamined. Journal of Functional Programming 19(2), 173–190 (2009)
PCRE - Perl Compatible Regular Expressions, http://www.pcre.org/
regex-posix: The posix regex backend for regex-base, http://hackage.haskell.org/package/regex-posix
regex-tdfa: A new all haskell tagged dfa regex engine, inspired by libtre, http://hackage.haskell.org/package/regex-tdfa
Sulzmann, M., Lu, K.Z.M.: Regular expression sub-matching using partial derivatives. In: Proc. of PPDP 2012, pp. 79–90. ACM (2012)
Vansummeren, S.: Type inference for unique pattern matching. ACM TOPLAS 28(3), 389–428 (2006)
Vouillon, J.: ocaml-re - Pure OCaml regular expressions, with support for Perl and POSIX-style strings, https://github.com/avsm/ocaml-re
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sulzmann, M., Lu, K.Z.M. (2014). POSIX Regular Expression Parsing with Derivatives. In: Codish, M., Sumii, E. (eds) Functional and Logic Programming. FLOPS 2014. Lecture Notes in Computer Science, vol 8475. Springer, Cham. https://doi.org/10.1007/978-3-319-07151-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-07151-0_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07150-3
Online ISBN: 978-3-319-07151-0
eBook Packages: Computer ScienceComputer Science (R0)