Skip to main content

String Constraints with Regex-Counting and String-Length Solved More Efficiently

  • Conference paper
  • First Online:
Dependable Software Engineering. Theories, Tools, and Applications (SETTA 2023)

Abstract

Regular expressions (regex for short) and string-length function are widely used in string-manipulating programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and string-length efficiently, especially when the counting and length bounds are large. In this work, we propose an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further, we also propose techniques to reduce the sizes of automata. We implement the algorithms and validate our approach on 48,843 benchmark instances. The experimental results show that our approach can solve more instances than the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are large.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In the rest of this paper, for clarity, we use counting to denote expressions of the form \(e^{\{m, n\}}\) and \(e^{\{m, \infty \}}\), but not \(e^*\) or \(e^+\).

  2. 2.

    Initially, we used the majority vote of the results of the solvers as the ground truth. Nevertheless, on some problem instances, all the results of the three solvers in the Z3 family are wrong (after manual inspection), thus failing this approach on these instances.

References

  1. Abdulla, P.A., et al.: Efficient handling of string-number conversion. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, pp. 943–957. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3385412.3386034

  2. Barbosa, H., et al.: CVC5: a versatile and industrial-strength SMT solver. In: TACAS 2022. LNCS, vol. 13243, pp. 415–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99524-9_24

    Chapter  Google Scholar 

  3. Berzish, M.: Z3str4: a solver for theories over strings. Ph.D. thesis, University of Waterloo, Ontario, Canada (2021). https://hdl.handle.net/10012/17102

  4. Berzish, M., et al.: Towards more efficient methods for solving regular-expression heavy string constraints. Theor. Comput. Sci. 943, 50–72 (2023)

    Article  MathSciNet  Google Scholar 

  5. Berzish, M., Ganesh, V., Zheng, Y.: Z3str3: a string solver with theory-aware heuristics. In: 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, 2–6 October, pp. 55–59 (2017). https://doi.org/10.23919/FMCAD.2017.8102241

  6. Berzish, M., et al.: An SMT solver for regular expressions and linear arithmetic over string length. In: Silva, A., Leino, K.R.M. (eds.) CAV 2021. LNCS, vol. 12760, pp. 289–312. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81688-9_14

    Chapter  Google Scholar 

  7. Bui, D., contributors: Z3-trau (2019). https://github.com/diepbp/z3-trau

  8. Cavada, R., et al.: The nuXmv symbolic model checker. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 334–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_22

    Chapter  Google Scholar 

  9. Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in Python. In: Zeller, A., Roychoudhury, A. (eds.) Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, 18–20 July 2016, pp. 282–293. ACM (2016). https://doi.org/10.1145/2931037.2931073

  10. Chen, H., Lu, P.: Checking determinism of regular expressions with counting. Inf. Comput. 241, 302–320 (2015). https://doi.org/10.1016/j.ic.2014.12.001

    Article  MathSciNet  Google Scholar 

  11. Chen, T., Chen, Y., Hague, M., Lin, A.W., Wu, Z.: What is decidable about string constraints with the replaceall function. PACMPL 2(POPL), 3:1–3:29 (2018). https://doi.org/10.1145/3158091

  12. Chen, T., et al.: Solving string constraints with regex-dependent functions through transducers with priorities and variables. Proc. ACM Program. Lang. 6(POPL), 1–31 (2022). https://doi.org/10.1145/3498707

  13. Chen, T., et al.: A decision procedure for path feasibility of string manipulating programs with integer data type. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 325–342. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_18

    Chapter  Google Scholar 

  14. Chen, T., Hague, M., Lin, A.W., Rümmer, P., Wu, Z.: Decision procedures for path feasibility of string-manipulating programs with complex operations. PACMPL 3(POPL) (2019). https://doi.org/10.1145/3290362

  15. D’Antoni, L.: Automatark: automata benchmark (2018). https://github.com/lorisdanto/automatark

  16. D’Antoni, L., Ferreira, T., Sammartino, M., Silva, A.: Symbolic register automata. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 3–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_1

    Chapter  Google Scholar 

  17. Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, pp. 246–256. Association for Computing Machinery, New York (2018)

    Google Scholar 

  18. Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, pp. 443–454. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338906.3338909

  19. Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. SIAM J. Comput. 41(1), 160–190 (2012). https://doi.org/10.1137/100814196

    Article  MathSciNet  Google Scholar 

  20. Haase, C.: A survival guide to Presburger arithmetic. ACM SIGLOG News 5(3), 67–82 (2018). https://doi.org/10.1145/3242953.3242964

    Article  Google Scholar 

  21. Holík, L., Síc, J., Turonová, L., Vojnar, T.: Fast matching of regular patterns with synchronizing counting. In: Kupferman, O., Sobocinski, P. (eds.) FoSSaCS 2023. LNCS, vol. 13992, pp. 392–412. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30829-1_19

    Chapter  Google Scholar 

  22. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Boston (1979)

    Google Scholar 

  23. Kaminski, M., Francez, N.: Finite-memory automata. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, vol. 2, pp. 683–688 (1990). https://doi.org/10.1109/FSCS.1990.89590

  24. Kulczynski, M., Manea, F., Nowotka, D., Poulsen, D.B.: ZaligVinder: a generic test framework for string solvers. J. Softw. Evol. Process 35(4), e2400 (2023). https://doi.org/10.1002/smr.2400

    Article  Google Scholar 

  25. Le Glaunec, A., Kong, L., Mamouras, K.: Regular expression matching using bit vector automata. Proc. ACM Program. Lang. 7(OOPSLA1) (2023). https://doi.org/10.1145/3586044

  26. Liang, T., Reynolds, A., Tinelli, C., Barrett, C., Deters, M.: A DPLL(T) theory solver for a theory of strings and regular expressions. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 646–662. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_43

    Chapter  Google Scholar 

  27. Liang, T., Tsiskaridze, N., Reynolds, A., Tinelli, C., Barrett, C.: A decision procedure for regular membership and length constraints over unbounded strings. In: Lutz, C., Ranise, S. (eds.) FroCoS 2015. LNCS (LNAI), vol. 9322, pp. 135–150. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24246-0_9

    Chapter  Google Scholar 

  28. Loring, B., Mitchell, D., Kinder, J.: Sound regular expression semantics for dynamic symbolic execution of JavaScript. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, 22–26 June 2019, pp. 425–438. ACM (2019). https://doi.org/10.1145/3314221.3314645

  29. Minsky, M.L.: Computation: Finite and Infinite Machines. Prentice-Hall Series in Automatic Computation. Prentice-Hall (1967)

    Google Scholar 

  30. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24

    Chapter  Google Scholar 

  31. Saxena, P., Akhawe, D., Hanna, S., Mao, F., McCamant, S., Song, D.: A symbolic execution framework for JavaScript. In: 2010 IEEE Symposium on Security and Privacy, pp. 513–528 (2010). https://doi.org/10.1109/SP.2010.38

  32. Seidl, H., Schwentick, T., Muscholl, A., Habermehl, P.: Counting in trees for free. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 1136–1149. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27836-8_94

    Chapter  Google Scholar 

  33. Turoňová, L., Holík, L., Lengál, O., Saarikivi, O., Veanes, M., Vojnar, T.: Regex matching with counting-set automata. Proc. ACM Program. Lang. 4(OOPSLA) (2020). https://doi.org/10.1145/3428286

  34. Verma, K.N., Seidl, H., Schwentick, T.: On the complexity of equational horn clauses. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS (LNAI), vol. 3632, pp. 337–352. Springer, Heidelberg (2005). https://doi.org/10.1007/11532231_25

    Chapter  Google Scholar 

  35. Wang, H.E., Chen, S.Y., Yu, F., Jiang, J.H.R.: A symbolic model checking approach to the analysis of string and length constraints. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, pp. 623–633. ACM (2018). https://doi.org/10.1145/3238147.3238189

  36. Wang, P., Stolee, K.T.: How well are regular expressions tested in the wild? In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 668–678. ACM (2018)

    Google Scholar 

  37. Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_14

    Chapter  Google Scholar 

  38. Zheng, Y., Zhang, X., Ganesh, V.: Z3-str: a Z3-based string solver for web application analysis. In: ESEC/SIGSOFT FSE, pp. 114–124 (2013). https://doi.org/10.1145/2491411.2491456

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhilin Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, D., Wu, Z. (2024). String Constraints with Regex-Counting and String-Length Solved More Efficiently. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8664-4_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8663-7

  • Online ISBN: 978-981-99-8664-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics