Abstract
Regular expressions (regex for short) and string-length function are widely used in string-manipulating programs. Counting is a frequently used feature in regexes that counts the number of matchings of sub-patterns. The state-of-the-art string solvers are incapable of solving string constraints with regex-counting and string-length efficiently, especially when the counting and length bounds are large. In this work, we propose an automata-theoretic approach for solving such class of string constraints. The main idea is to symbolically model the counting operators by registers in automata instead of unfolding them explicitly, thus alleviating the state explosion problem. Moreover, the string-length function is modeled by a register as well. As a result, the satisfiability of string constraints with regex-counting and string-length is reduced to the satisfiability of linear integer arithmetic, which the off-the-shelf SMT solvers can then solve. To improve the performance further, we also propose techniques to reduce the sizes of automata. We implement the algorithms and validate our approach on 48,843 benchmark instances. The experimental results show that our approach can solve more instances than the state-of-the-art solvers, at a comparable or faster speed, especially when the counting and length bounds are large.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In the rest of this paper, for clarity, we use counting to denote expressions of the form \(e^{\{m, n\}}\) and \(e^{\{m, \infty \}}\), but not \(e^*\) or \(e^+\).
- 2.
Initially, we used the majority vote of the results of the solvers as the ground truth. Nevertheless, on some problem instances, all the results of the three solvers in the Z3 family are wrong (after manual inspection), thus failing this approach on these instances.
References
Abdulla, P.A., et al.: Efficient handling of string-number conversion. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2020, pp. 943–957. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3385412.3386034
Barbosa, H., et al.: CVC5: a versatile and industrial-strength SMT solver. In: TACAS 2022. LNCS, vol. 13243, pp. 415–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99524-9_24
Berzish, M.: Z3str4: a solver for theories over strings. Ph.D. thesis, University of Waterloo, Ontario, Canada (2021). https://hdl.handle.net/10012/17102
Berzish, M., et al.: Towards more efficient methods for solving regular-expression heavy string constraints. Theor. Comput. Sci. 943, 50–72 (2023)
Berzish, M., Ganesh, V., Zheng, Y.: Z3str3: a string solver with theory-aware heuristics. In: 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, 2–6 October, pp. 55–59 (2017). https://doi.org/10.23919/FMCAD.2017.8102241
Berzish, M., et al.: An SMT solver for regular expressions and linear arithmetic over string length. In: Silva, A., Leino, K.R.M. (eds.) CAV 2021. LNCS, vol. 12760, pp. 289–312. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81688-9_14
Bui, D., contributors: Z3-trau (2019). https://github.com/diepbp/z3-trau
Cavada, R., et al.: The nuXmv symbolic model checker. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 334–342. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_22
Chapman, C., Stolee, K.T.: Exploring regular expression usage and context in Python. In: Zeller, A., Roychoudhury, A. (eds.) Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, 18–20 July 2016, pp. 282–293. ACM (2016). https://doi.org/10.1145/2931037.2931073
Chen, H., Lu, P.: Checking determinism of regular expressions with counting. Inf. Comput. 241, 302–320 (2015). https://doi.org/10.1016/j.ic.2014.12.001
Chen, T., Chen, Y., Hague, M., Lin, A.W., Wu, Z.: What is decidable about string constraints with the replaceall function. PACMPL 2(POPL), 3:1–3:29 (2018). https://doi.org/10.1145/3158091
Chen, T., et al.: Solving string constraints with regex-dependent functions through transducers with priorities and variables. Proc. ACM Program. Lang. 6(POPL), 1–31 (2022). https://doi.org/10.1145/3498707
Chen, T., et al.: A decision procedure for path feasibility of string manipulating programs with integer data type. In: Hung, D.V., Sokolsky, O. (eds.) ATVA 2020. LNCS, vol. 12302, pp. 325–342. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59152-6_18
Chen, T., Hague, M., Lin, A.W., Rümmer, P., Wu, Z.: Decision procedures for path feasibility of string-manipulating programs with complex operations. PACMPL 3(POPL) (2019). https://doi.org/10.1145/3290362
D’Antoni, L.: Automatark: automata benchmark (2018). https://github.com/lorisdanto/automatark
D’Antoni, L., Ferreira, T., Sammartino, M., Silva, A.: Symbolic register automata. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11561, pp. 3–21. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_1
Davis, J.C., Coghlan, C.A., Servant, F., Lee, D.: The impact of regular expression denial of service (ReDoS) in practice: an empirical study at the ecosystem scale. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018, pp. 246–256. Association for Computing Machinery, New York (2018)
Davis, J.C., Michael IV, L.G., Coghlan, C.A., Servant, F., Lee, D.: Why aren’t regular expressions a lingua franca? An empirical study on the re-use and portability of regular expressions. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, pp. 443–454. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3338906.3338909
Gelade, W., Gyssens, M., Martens, W.: Regular expressions with counting: weak versus strong determinism. SIAM J. Comput. 41(1), 160–190 (2012). https://doi.org/10.1137/100814196
Haase, C.: A survival guide to Presburger arithmetic. ACM SIGLOG News 5(3), 67–82 (2018). https://doi.org/10.1145/3242953.3242964
Holík, L., Síc, J., Turonová, L., Vojnar, T.: Fast matching of regular patterns with synchronizing counting. In: Kupferman, O., Sobocinski, P. (eds.) FoSSaCS 2023. LNCS, vol. 13992, pp. 392–412. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30829-1_19
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Boston (1979)
Kaminski, M., Francez, N.: Finite-memory automata. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, vol. 2, pp. 683–688 (1990). https://doi.org/10.1109/FSCS.1990.89590
Kulczynski, M., Manea, F., Nowotka, D., Poulsen, D.B.: ZaligVinder: a generic test framework for string solvers. J. Softw. Evol. Process 35(4), e2400 (2023). https://doi.org/10.1002/smr.2400
Le Glaunec, A., Kong, L., Mamouras, K.: Regular expression matching using bit vector automata. Proc. ACM Program. Lang. 7(OOPSLA1) (2023). https://doi.org/10.1145/3586044
Liang, T., Reynolds, A., Tinelli, C., Barrett, C., Deters, M.: A DPLL(T) theory solver for a theory of strings and regular expressions. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 646–662. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_43
Liang, T., Tsiskaridze, N., Reynolds, A., Tinelli, C., Barrett, C.: A decision procedure for regular membership and length constraints over unbounded strings. In: Lutz, C., Ranise, S. (eds.) FroCoS 2015. LNCS (LNAI), vol. 9322, pp. 135–150. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24246-0_9
Loring, B., Mitchell, D., Kinder, J.: Sound regular expression semantics for dynamic symbolic execution of JavaScript. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, 22–26 June 2019, pp. 425–438. ACM (2019). https://doi.org/10.1145/3314221.3314645
Minsky, M.L.: Computation: Finite and Infinite Machines. Prentice-Hall Series in Automatic Computation. Prentice-Hall (1967)
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Saxena, P., Akhawe, D., Hanna, S., Mao, F., McCamant, S., Song, D.: A symbolic execution framework for JavaScript. In: 2010 IEEE Symposium on Security and Privacy, pp. 513–528 (2010). https://doi.org/10.1109/SP.2010.38
Seidl, H., Schwentick, T., Muscholl, A., Habermehl, P.: Counting in trees for free. In: Díaz, J., Karhumäki, J., Lepistö, A., Sannella, D. (eds.) ICALP 2004. LNCS, vol. 3142, pp. 1136–1149. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27836-8_94
Turoňová, L., Holík, L., Lengál, O., Saarikivi, O., Veanes, M., Vojnar, T.: Regex matching with counting-set automata. Proc. ACM Program. Lang. 4(OOPSLA) (2020). https://doi.org/10.1145/3428286
Verma, K.N., Seidl, H., Schwentick, T.: On the complexity of equational horn clauses. In: Nieuwenhuis, R. (ed.) CADE 2005. LNCS (LNAI), vol. 3632, pp. 337–352. Springer, Heidelberg (2005). https://doi.org/10.1007/11532231_25
Wang, H.E., Chen, S.Y., Yu, F., Jiang, J.H.R.: A symbolic model checking approach to the analysis of string and length constraints. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, pp. 623–633. ACM (2018). https://doi.org/10.1145/3238147.3238189
Wang, P., Stolee, K.T.: How well are regular expressions tested in the wild? In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 668–678. ACM (2018)
Zheng, Y., Ganesh, V., Subramanian, S., Tripp, O., Dolby, J., Zhang, X.: Effective search-space pruning for solvers of string equations, regular expressions and length constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) CAV 2015. LNCS, vol. 9206, pp. 235–254. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21690-4_14
Zheng, Y., Zhang, X., Ganesh, V.: Z3-str: a Z3-based string solver for web application analysis. In: ESEC/SIGSOFT FSE, pp. 114–124 (2013). https://doi.org/10.1145/2491411.2491456
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hu, D., Wu, Z. (2024). String Constraints with Regex-Counting and String-Length Solved More Efficiently. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_1
Download citation
DOI: https://doi.org/10.1007/978-981-99-8664-4_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8663-7
Online ISBN: 978-981-99-8664-4
eBook Packages: Computer ScienceComputer Science (R0)