skip to main content
10.1145/3453483.3454066acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections

Symbolic Boolean derivatives for efficiently solving extended regular expression constraints

Published:18 June 2021Publication History

ABSTRACT

The manipulation of raw string data is ubiquitous in security-critical software, and verification of such software relies on efficiently solving string and regular expression constraints via SMT. However, the typical case of Boolean combinations of regular expression constraints exposes blowup in existing techniques. To address solvability of such constraints, we propose a new theory of derivatives of symbolic extended regular expressions (extended meaning that complement and intersection are incorporated), and show how to apply this theory to obtain more efficient decision procedures. Our implementation of these ideas, built on top of Z3, matches or outperforms state-of-the-art solvers on standard and handwritten benchmarks, showing particular benefits on examples with Boolean combinations.

Our work is the first formalization of derivatives of regular expressions which both handles intersection and complement and works symbolically over an arbitrary character theory. It unifies existing approaches involving derivatives of extended regular expressions, alternating automata and Boolean automata by lifting them to a common symbolic platform. It relies on a parsimonious augmentation of regular expressions: a construct for symbolic conditionals is shown to be sufficient to obtain relevant closure properties for derivatives over extended regular expressions.

Skip Supplemental Material Section

Supplemental Material

References

  1. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Julian Dolby, Petr Janku, Hsin-Hung Lin, Lukás Holík, and Wei-Cheng Wu. 2020. Efficient handling of string-number conversion. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 943–957. https://doi.org/10.1145/3385412.3386034 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Bui Phi Diep, Lukáš Holík, Ahmed Rezine, and Philipp Rümmer. 2018. Trau: SMT solver for string constraints. In 2018 Formal Methods in Computer Aided Design (FMCAD). 1–5.Google ScholarGoogle Scholar
  3. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukáš Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2014. String constraints for verification. In International Conference on Computer Aided Verification. 150–166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Yu-Fang Chen, Lukáš Holík, Ahmed Rezine, Philipp Rümmer, and Jari Stenman. 2015. Norn: An SMT solver for string constraints. In International Conference on Computer Aided Verification. 462–469.Google ScholarGoogle ScholarCross RefCross Ref
  5. Cyril Allauzen and Mehryar Mohri. 2006. A unified construction of the Glushkov, Follow, and Antimirov automata. In International Symposium on Mathematical Foundations of Computer Science. 110–121.Google ScholarGoogle ScholarCross RefCross Ref
  6. Valentin Antimirov. 1995. Partial Derivatives of Regular Expressions and Finite Automata Constructions. Theoretical Computer Science, 155 (1995), 291–319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Valentin M Antimirov and Peter D Mosses. 1995. Rewriting extended regular expressions. Theoretical Computer Science, 143, 1 (1995), 51–72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. John Backes, Pauline Bolignano, Byron Cook, Catherine Dodge, Andrew Gacek, Kasper Sœ Luckow, Neha Rungta, Oksana Tkachuk, and Carsten Varming. 2018. Semantic-based Automated Reasoning for AWS Access Policies using SMT. In 2018 Formal Methods in Computer Aided Design, FMCAD 2018, Austin, TX, USA, October 30 - November 2, 2018, Nikolaj Bjørner and Arie Gurfinkel (Eds.). IEEE, 1–9. https://doi.org/10.23919/FMCAD.2018.8602994 Google ScholarGoogle ScholarCross RefCross Ref
  9. Clark Barrett, Christopher L Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. Cvc4. In International Conference on Computer Aided Verification. 171–177.Google ScholarGoogle ScholarCross RefCross Ref
  10. Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, and Robert Endre Tarjan. 2011. A New Approach to Incremental Cycle Detection and Related Problems. CoRR, abs/1112.0784 (2011), arxiv:1112.0784Google ScholarGoogle Scholar
  11. Murphy Berzish, Vijay Ganesh, and Yunhui Zheng. 2017. Z3str3: A string solver with theory-aware heuristics. In 2017 Formal Methods in Computer Aided Design (FMCAD). 55–59.Google ScholarGoogle Scholar
  12. Nikolaj Bjørner, Vijay Ganesh, Raphael Michel, and Margus Veanes. 2012. An SMT-LIB Format for Sequences and Regular Expressions. In SMT’12, P. Fontaine and A. Goel (Eds.). 76–86.Google ScholarGoogle Scholar
  13. Martin Brain, James H Davenport, and Alberto Griggio. 2017. Benchmarking Solvers, SAT-style.. In SC2 ISSAC.Google ScholarGoogle Scholar
  14. Janusz A. Brzozowski. 1964. Derivatives of regular expressions. JACM, 11 (1964), 481–494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. A. Brzozowski and E. Leiss. 1980. On equations for regular languages, finite automata, and sequential networks. Theoretical Computer Science, 10 (1980), 19–35.Google ScholarGoogle ScholarCross RefCross Ref
  16. Tevfik Bultan, Fang Yu, Muath Alkhalaf, and Abdulbaki Aydin. 2017. String Analysis for Software Verification and Security. Springer.Google ScholarGoogle Scholar
  17. Pascal Caron, Jean-Marc Champarnaud, and Ludovic Mignot. 2011. Partial Derivatives of an Extended Regular Expression. In Language and Automata Theory and Applications, LATA 2011 (LNCS, Vol. 6638). Springer, 179–191.Google ScholarGoogle Scholar
  18. Ashok K. Chandra, Dexter C. Kozen, and Larry J. Stockmeyer. 1981. Alternation. JACM, 28, 1 (1981), 114–133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Taolue Chen, Matthew Hague, Jinlong He, Denghang Hu, Anthony Widjaja Lin, Philipp Rümmer, and Zhilin Wu. 2020. A Decision Procedure for Path Feasibility of String Manipulating Programs with Integer Data Type. In International Symposium on Automated Technology for Verification and Analysis. 325–342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Taolue Chen, Matthew Hague, Anthony W Lin, Philipp Rümmer, and Zhilin Wu. 2019. Decision procedures for path feasibility of string-manipulating programs with complex operations. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. CVC4. 2020. https://github.com/CVC4/CVC4.Google ScholarGoogle Scholar
  22. Loris D’Antoni, Zachary Kincaid, and Fang Wang. 2018. A Symbolic Decision Procedure for Symbolic Alternating Finite Automata. Electronic Notes in Theoretical Computer Science, 336 (2018), 79–99.Google ScholarGoogle ScholarCross RefCross Ref
  23. Loris D’Antoni and Margus Veanes. 2014. Minimization of Symbolic Automata. ACM SIGPLAN Notices – POPL’14, 49, 1 (2014), 541–553. https://doi.org/10.1145/2535838.2535849 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Loris D’Antoni and Margus Veanes. 2020. Automata Modulo Theories. Commun. ACM.Google ScholarGoogle Scholar
  25. James C Davis. 2019. Rethinking Regex engines to address ReDoS. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1256–1258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In TACAS’08 (LNCS). Springer, 337–340.Google ScholarGoogle Scholar
  27. Keith Ellul, Bryan Krawetz, Jeffrey Shallit, and Ming-Wei Wang. 2005. Regular expressions: New results and open problems. J. Autom. Lang. Comb., 10, 4 (2005), 407–437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wouter Gelade and Frank Neven. 2008. Succinctness of the complement and intersection of regular expressions. arXiv preprint arXiv:0802.2869.Google ScholarGoogle Scholar
  29. Dan Gusfield. 1997. Algorithms on stings, trees, and sequences: Computer science and computational biology. Acm Sigact News, 28, 4 (1997), 41–60.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J.G. Henriksen, J. Jensen, M. Jørgensen, N. Klarlund, B. Paige, T. Rauhe, and A. Sandholm. 1995. Mona: Monadic Second-order logic in practice. In TACAS ’95 (LNCS, Vol. 1019). Springer.Google ScholarGoogle Scholar
  31. Hossein Hojjat, Philipp Rümmer, and Ali Shamakhi. 2019. On Strings in Software Model Checking. In APLAS, A. Lin (Ed.) (LNCS, Vol. 11893). Springer.Google ScholarGoogle Scholar
  32. Lukáš Holík, Petr Janků, Anthony W Lin, Philipp Rümmer, and Tomáš Vojnar. 2017. String constraints with concatenation and transducers solved efficiently. Proceedings of the ACM on Programming Languages, 2, POPL (2017), 1–32.Google ScholarGoogle Scholar
  33. H. B. Hunt III. 1973. The equivalence problem for regular expressions with intersections is not polynomial in tape. Department of Computer Science, Cornell University, Ithaca, New York.Google ScholarGoogle Scholar
  34. R. Iosif, A. Rogalewicz, and T. Vojnar. 2016. Abstraction refinement and antichains for trace inclusion of infinite state systems. In TACAS’16 (LNCS, Vol. 9636). Springer, 71–89.Google ScholarGoogle Scholar
  35. Radu Iosif and Xiao Xu. 2018. Abstraction Refinement for Emptiness Checking of Alternating Data Automata. In TACAS’18, Dirk Beyer and Marieke Huisman (Eds.). Springer, 93–111.Google ScholarGoogle Scholar
  36. Matthias Keil and Peter Thiemann. 2014. Symbolic Solving of Extended Regular Expression Inequalities. In FSTTCS’14 (LIPIcs). 175–186.Google ScholarGoogle Scholar
  37. Nils Klarlund, Anders Møller, and Michael I. Schwartzbach. 2002. MONA Implementation Secrets. International Journal of Foundations of Computer Science, 13, 4 (2002), 571–586.Google ScholarGoogle ScholarCross RefCross Ref
  38. Dexter Kozen. 1976. On parallelism in Turing machines. In 17th Annual Symposium on Foundations of Computer Science, FOCS’76. IEEE Xplore, 89–97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Dexter Kozen. 1977. Lower bounds for natural proof systems. In 18th Annual Symposium on Foundations of Computer Science (SFCS 1977). 254–266. https://doi.org/10.1109/SFCS.1977.16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dexter Kozen. 1997. Kleene algebra with tests. Transactions on Programming Languages and Systems, 19 (1997), 427–443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Orna Kupferman and Sharon Zuhovitzky. 2002. An improved algorithm for the membership problem for extended regular expressions. In International Symposium on Mathematical Foundations of Computer Science. 446–458.Google ScholarGoogle ScholarCross RefCross Ref
  42. Tianyi Liang, Andrew Reynolds, Cesare Tinelli, Clark Barrett, and Morgan Deters. 2014. A DPLL (T) theory solver for a theory of strings and regular expressions. In International Conference on Computer Aided Verification. 646–662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tianyi Liang, Nestan Tsiskaridze, Andrew Reynolds, Cesare Tinelli, and Clark Barrett. 2015. A Decision Procedure for Regular Membership and Length Constraints over Unbounded Strings? In FroCoS 2015: Frontiers of Combining Systems (LNCS, Vol. 9322). Springer, 135–150.Google ScholarGoogle Scholar
  44. Blake Loring, Duncan Mitchell, and Johannes Kinder. 2019. Sound regular expression semantics for dynamic symbolic execution of JavaScript. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. 425–438.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Microsoft. 2020. Azure Resource Manager documentation. https://docs.microsoft.com/en-us/azure/azure-resource-manager/.Google ScholarGoogle Scholar
  46. Microsoft. 2020. .NET regular expressions. https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions.Google ScholarGoogle Scholar
  47. MiniZinc. 2020. https://www.minizinc.org.Google ScholarGoogle Scholar
  48. Mehryar Mohri. 1996. On some applications of finite-state automata theory to natural language processing. Natural Language Engineering, 2, 1 (1996), 61–80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Robert Nieuwenhuis, Albert Oliveras, and Cesare Tinelli. 2006. Solving SAT and SAT Modulo Theories: From an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J. ACM, 53, 6 (2006), 937–977. https://doi.org/10.1145/1217856.1217859 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ostrich. 2020. https://github.com/uuverifiers/ostrich/.Google ScholarGoogle Scholar
  51. Scott Owens, John Reppy, and Aaron Turon. 2009. Regular-expression derivatives re-examined. Journal of Functional Programming, 19, 2 (2009), 173–190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. passwords generator.org. 2020. https://passwords-generator.org/.Google ScholarGoogle Scholar
  53. Damien Pous. 2015. Symbolic Algorithms for Language Equivalence and Kleene Algebra with Tests. ACM SIGPLAN Notices – POPL’15, 50, 1 (2015), 357–368. https://doi.org/10.1145/2775051.2677007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Grigore Roşu and Mahesh Viswanathan. 2003. Testing extended regular language membership incrementally by rewriting. In International Conference on Rewriting Techniques and Applications. 499–514.Google ScholarGoogle ScholarCross RefCross Ref
  55. Olli Saarikivi, Margus Veanes, Todd Mytkowicz, and Madan Musuvathi. 2017. Fusing Effectful Comprehensions. In ACM SIGPLAN Notices – PLDI’17. ACM.Google ScholarGoogle Scholar
  56. Koushik Sen and Grigore Roşu. 2003. Generating optimal monitors for extended regular expressions. Electronic Notes in Theoretical Computer Science, 89, 2 (2003), 226–245.Google ScholarGoogle ScholarCross RefCross Ref
  57. Reetinder Sidhu and Viktor K Prasanna. 2001. Fast regular expression matching using FPGAs. In The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’01). 227–238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. SMT. 2012. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/nbjorner-microsoft.automata.smtbenchmarks.zip.Google ScholarGoogle Scholar
  59. SMTLib. 2020. https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks/QF_S.Google ScholarGoogle Scholar
  60. SMTLib. 2020. https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks/QF_SLIA.Google ScholarGoogle Scholar
  61. stackoverflow.com. 2020. Regex for password must contain at least eight characters, at least one number and both lower and uppercase letters and special characters. https://stackoverflow.com/questions/19605150/regex-for-password-must-contain-at-least-eight-characters-at-least-one-number-a.Google ScholarGoogle Scholar
  62. L. J. Stockmeyer and A. R. Meyer. 1973. Word Problems Requiring Exponential Time(Preliminary Report). In Proceedings of the Fifth Annual ACM Symposium on Theory of Computing, STOC’73. ACM, 1–9. https://doi.org/10.1145/800125.804029 Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Robert E. Tarjan. 1975. Efficiency of a good but not linear set union algorithm. JACM, 22 (1975), 215–225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Cesare Tinelli, Clark Barrett, and Pascal Fontaine. 2020. http://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml.Google ScholarGoogle Scholar
  65. Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. 2014. S3: A Symbolic String Solver for Vulnerability Detection in Web Applications. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS ’14). Association for Computing Machinery, New York, NY, USA. 1232–1243. isbn:9781450329576 https://doi.org/10.1145/2660267.2660372 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Margus Veanes, Nikolaj Bjørner, and Leonardo de Moura. 2010. Symbolic Automata Constraint Solving. In Logic for Programming, Artificial Intelligence, and Reasoning. LPAR 2010, C.G. Fermüller and A. Voronkov (Eds.) (LNCS, Vol. 6397). Springer, 640–654.Google ScholarGoogle Scholar
  67. Margus Veanes, Olli Saarikivi, Tiki Wan, and Eric Xu. 2019. Symbolic Regex Matcher. In TACAS’19 (LNCS). Springer.Google ScholarGoogle Scholar
  68. Z3. 2020. https://github.com/z3prover/z3.Google ScholarGoogle Scholar
  69. Z3-Trau. 2020. https://github.com/diepbp/z3-trau.Google ScholarGoogle Scholar
  70. Z3str3. 2020. https://sites.google.com/site/z3strsolver/.Google ScholarGoogle Scholar
  71. Yunhui Zheng, Vijay Ganesh, Sanu Subramanian, Omer Tripp, Murphy Berzish, Julian Dolby, and Xiangyu Zhang. 2017. Z3str2: an efficient solver for strings, regular expressions, and length constraints. Formal Methods in System Design, 50, 2-3 (2017), 249–288.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Symbolic Boolean derivatives for efficiently solving extended regular expression constraints

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
          June 2021
          1341 pages
          ISBN:9781450383912
          DOI:10.1145/3453483

          Copyright © 2021 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 June 2021

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate406of2,067submissions,20%

          Upcoming Conference

          PLDI '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader