skip to main content
10.1145/2370776.2370788acmotherconferencesArticle/Chapter ViewAbstractPublication PagesppdpConference Proceedingsconference-collections
research-article

Regular expression sub-matching using partial derivatives

Published: 19 September 2012 Publication History

Abstract

Regular expression sub-matching is the problem of finding for each sub-part of a regular expression a matching sub-string. Prior work applies Thompson and Glushkov NFA methods for the construction of the matching automata. We propose the novel use of derivatives and partial derivatives for regular expression sub-matching. Our benchmarking results show that the run-time performance is promising and that our approach can be applied in practice.

References

[1]
C. Allauzen and M. Mohri. A unified construction of the Glushkov, follow, and Antimirov automata. In Proc. of MFCS'06, volume 4162 of LNCS, pages 110--121. Springer, 2006.
[2]
V. M. Antimirov. Partial derivatives of regular expressions and finite automaton constructions. Theoretical Computer Science, 155(2):291--319, 1996.
[3]
J. A. Brzozowski. Derivatives of regular expressions. J. ACM, 11(4):481--494, 1964.
[4]
bytestring: Fast, packed, strict and lazy byte arrays with a list interface. http://www.cse.unsw.edu.au/~dons/fps.html.
[5]
R. Cox. Regular expression matching can be simple and fast (but is slow in java, perl, php, python, ruby, ...), 2007. http://swtch.com/~rsc/regexp/regexp1.html.
[6]
R. Cox. Regular expression matching in the wild, 2010. http://swtch.com/~rsc/regexp/regexp3.html.
[7]
S. Fischer, F. Huch, and T. Wilke. A play on regular expressions: functional pearl. In Proc. of ICFP'10, pages 357--368. ACM Press, 2010.
[8]
A. Frisch and L. Cardelli. Greedy regular expression matching. In Proc. of ICALP'04, pages 618--629. Spinger-Verlag, 2004.
[9]
H. Hosoya and B. C. Pierce. Regular expression pattern matching for XML. In Proc. of POPL '01, pages 67--80. ACM Press, 2001.
[10]
S. M. Kearns. Extending regular expressions with context operators and parse extraction. Software - Practice and Experience, 21(8):787--804, 1991.
[11]
C. Kuklewicz. Forward regular expression matching with bounded space, 2007. http://haskell.org/haskellwiki/RegexpDesign.
[12]
V. Laurikari. NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In SPIRE, pages 181--187, 2000.
[13]
V. Laurikari. Efficient submatch addressing for regular expressions, 2001. Master thesis.
[14]
K. Z. M. Lu and M. Sulzmann. An implementation of subtyping among regular expression types. In Proc. of APLAS'04, volume 3302 of LNCS, pages 57--73. Springer-Verlag, 2004.
[15]
G. Navarro and M. Raffinot. Compact dfa representation for fast regular expression search. In Proc. of Algorithm Engineering'01, volume 2141 of LNCS, pages 1--12. Springer, 2001.
[16]
S. Owens, J. Reppy, and A. Turon. Regular-expression derivatives reexamined. Journal of Functional Programming, 19(2):173--190, 2009.
[17]
regex-pcre: The pcre backend to accompany regex-base. http://hackage.haskell.org/package/regex-pcre.
[18]
pcre-light: A small, efficient and portable regex library for perl 5 compatible regular expressions. http://hackage.haskell.org/package/pcre-light.
[19]
regex-tdfa: A new all haskell tagged dfa regex engine, inspired by libtre. http://hackage.haskell.org/package/regex-tdfa.
[20]
G. Rosu and M. Viswanathan. Testing extended regular language membership incrementally by rewriting. In Proc. of RTA'03, volume 2706 of LNCS, pages 499--514. Springer, 2003.
[21]
Nelma Moreira Sabine Broda, Antonio Machiavelo and Rogerio Reis. Study of the average size of glushkov and partial derivative automata, Octorber 2011.
[22]
M. Sulzmann and K. Z. M. Lu. A type-safe embedding of XDuce into ML. In Proc. of ACM SIGPLAN Workshop on ML, Electronic Notes in Computer Science, pages 229--253, 2005.
[23]
M. Sulzmann and K. Z. M. Lu. Xhaskell - adding regular expression types to haskell. In Proc. of IFL'07, volume 5083 of LNCS, pages 75--92. Springer-Verlag, 2007.
[24]
K. Thompson. Programming techniques: Regular expression search algorithm. Commun. ACM, 11(6):419--422, 1968.

Cited By

View all
  • (2025)RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement, and Restricted LookaroundsProceedings of the ACM on Programming Languages10.1145/37048379:POPL(1-32)Online publication date: 9-Jan-2025
  • (2024)Static Analysis for Checking the Disambiguation Robustness of Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564618:PLDI(2073-2097)Online publication date: 20-Jun-2024
  • (2024)Lean Formalization of Extended Regular Expression Matching with LookaroundsProceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3636501.3636959(118-131)Online publication date: 9-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PPDP '12: Proceedings of the 14th symposium on Principles and practice of declarative programming
September 2012
226 pages
ISBN:9781450315227
DOI:10.1145/2370776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Kuleuven Belgium: Kuleuven Belgium

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. haskell
  2. symbolic automata construction

Qualifiers

  • Research-article

Conference

PPDP'12
Sponsor:
  • Kuleuven Belgium

Acceptance Rates

Overall Acceptance Rate 230 of 486 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement, and Restricted LookaroundsProceedings of the ACM on Programming Languages10.1145/37048379:POPL(1-32)Online publication date: 9-Jan-2025
  • (2024)Static Analysis for Checking the Disambiguation Robustness of Regular ExpressionsProceedings of the ACM on Programming Languages10.1145/36564618:PLDI(2073-2097)Online publication date: 20-Jun-2024
  • (2024)Lean Formalization of Extended Regular Expression Matching with LookaroundsProceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs10.1145/3636501.3636959(118-131)Online publication date: 9-Jan-2024
  • (2023)Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking SemanticsProceedings of the ACM on Programming Languages10.1145/35912627:PLDI(1026-1049)Online publication date: 6-Jun-2023
  • (2022)From regular expression matching to parsingActa Informatica10.1007/s00236-022-00420-659:6(709-724)Online publication date: 30-Mar-2022
  • (2022)Manipulation of Regular Expressions Using Derivatives: An OverviewImplementation and Application of Automata10.1007/978-3-031-07469-1_2(19-33)Online publication date: 28-May-2022
  • (2016)Kleenex: compiling nondeterministic transducers to deterministic streaming transducersACM SIGPLAN Notices10.1145/2914770.283764751:1(284-297)Online publication date: 11-Jan-2016
  • (2016)Kleenex: compiling nondeterministic transducers to deterministic streaming transducersProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages10.1145/2837614.2837647(284-297)Online publication date: 11-Jan-2016
  • (2016)Implementing Cost-Effective Data Collection and Extraction Processes with CollaMine2016 International Conference on Cloud Computing Research and Innovations (ICCCRI)10.1109/ICCCRI.2016.22(92-99)Online publication date: May-2016
  • (2016)Derivative-Based Diagnosis of Regular Expression AmbiguityImplementation and Application of Automata10.1007/978-3-319-40946-7_22(260-272)Online publication date: 6-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media