Abstract
In recent years, string solvers have become an essential component in many formal verification, security analysis, and bug-finding tools. Such solvers typically support a theory of string equations, the length function, and the regular-expression membership predicate. These enable considerable expressive power, which comes at the cost of slow solving time, and in some cases even non-termination. We present three techniques, designed for word-based SMT string solvers, to mitigate these problems: (1) detecting overlapping variables, which is essential to avoiding common cases of non-termination; (2) pruning of the search space via bi-directional integration between the string and integer theories, enabling new cross-domain heuristics; and (3) a binary search based heuristic, allowing the procedure to skip unnecessary string length queries and converge on consistent length assignments faster for large strings. We have implemented above techniques atop the Z3-str solver, resulting in a significantly more robust and efficient solver, dubbed Z3str2, for the quantifier-free theory of string equations, the regular-expression membership predicate, and linear arithmetic over the length function. We report on a series of experiments over four sets of challenging real-world benchmarks, where we compare Z3str2 with five different string solvers: S3, CVC4, Kaluza, PISA and Stranger. Each of these tools utilizes a different solving strategy and/or string representation (based e.g. on words, bit vectors or automata). The results point to the efficacy of our proposed techniques, which yield dramatic performance improvement. We also demonstrate performance improvements enabled by Z3str2 in the context of symbolic execution for string-manipulating programs. We observe that the techniques presented here are of broad applicability, and can be integrated into other string solvers to improve their performance.
Similar content being viewed by others
Notes
The Z3str2 code, as well as the data pertaining to our experiments, are all available at [1].
This restriction is made only for consistency with the SMT-LIB standard [7] in a typical implementation. The only integer literals allowed in the SMT-LIB syntax are 0 and positive integers. Note that both the SMT-LIB standard and the syntax we give here still allow negative integer constants to be expressed as, for example, (- 0 3) for \(-3\).
While we do support AND–OR combination of word equations, without loss of generality it is sufficient to describe the procedure only for a conjunction of word equations.
References
Z3str2 String Constraint Solver. https://sites.google.com/site/z3strsolver/
IBM Security AppScan Source. http://www-03.ibm.com/software/products/en/appscan-source
Kausler Suite. https://github.com/BoiseState/string-constraint-solvers
LibStranger. https://github.com/vlab-cs-ucsb/LibStranger
Personal communications with the Stranger team. 2015
Abdulla PA, Atig MF, Chen Y-F, Holík L, Rezine A, Rümmer P, Stenman J (2014) String constraints for verification. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 150–166
Barrett C, Fontaine P, Tinelli C (2016) The Satisfiability Modulo Theories Library (SMT-LIB). http://www.SMT-LIB.org
Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 307–321
Cristian C, Daniel D, Dawson E (2008) Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In: Proceedings of the 8th USENIX conference on operating systems design and implementation, OSDI’08. USENIX Association, Berkeley, pp 209–224
Chipounov V, Kuznetsov V, Candea G (2011) S2e: a platform for in-vivo multi-path analysis of software systems. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems. ASPLOS XVI. ACM, New York, pp 265–278
Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: Proceedings of the 10th international conference on static analysis, SAS’03, pp 1–18
De Moura L, Bjørner N (2008) Z3: an efficient smt solver. In: Proceedings of the theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems, TACAS’08, pp 337–340
Ganesh V, Dill DL (2007) A decision procedure for bit-vectors and arrays. In: Proceedings of the 19th international conference on computer aided verification, CAV’07, pp 519–531
Ganesh V, Minnes M, Solar-Lezama A, Rinard M (2012) Word equations with length constraints: what’s decidable? In: HVC’12
Ghosh I, Shafiei N, Li G, Chiang W-F (2013) JST: an automatic test generation tool for industrial java applications with strings. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 992–1001
Hooimeijer P, Weimer W (2010) Solving string constraints lazily. In: Proceedings of the IEEE/ACM international conference on automated software engineering, ASE ’10, pp 377–386
Hopcroft JE, Motwani R, Ullman JD (2007) Introduction to automata theory, languages, and computation. Pearson/Addison Wesley, Reading
Jeon J, Qiu X, Fetter-Degges J, Foster JS, Solar-Lezama A (2016) Synthesizing framework models for symbolic execution. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. ACM, New York, pp 156–167
Jeż A (2013) Recompression: word equations and beyond. In: Developments in language theory, Lecture Notes in Computer Science, pp 12–26
Karhumäki J, Mignosi F, Plandowski W (2000) The expressibility of languages and relations by word equations. J ACM 47(3):483–505
Kausler S (2014) Evaluation of string constraint solvers using dynamic symbolic execution. Master’s Thesis, Boise State University
Kausler S, Sherman E (2014) Evaluation of string constraint solvers in the context of symbolic execution. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, ASE ’14. ACM, New York, pp 259–270
Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) Hampi: a solver for string constraints. In: Proceedings of the eighteenth international symposium on software testing and analysis, ISSTA ’09, pp 105–116
Li G, Andreasen E, Ghosh I (2014) SymJS: automatic symbolic testing of javascript web applications. In: Proceedings of the 22Nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 449–459
Li G, Ghosh I (2013) PASS: string solving with parameterized array and interval automaton. In: 9th international Haifa verification conference, HVC ’13, pp 15–31
Liang T, Reynolds A, Tinelli C, Barrett C, Deters M (2014) A dpll(t) theory solver for a theory of strings and regular expressions. In: Proceedings of the 26th international conference on computer aided verification, CAV’14, pp 646–662
Makanin GS (1977) The problem of solvability of equations in a free semigroup. Math Sbornik 103:147–236 (1977). English transl. in Math USSR Sbornik 32
Matiyasevich Y (2007) Word equations, Fibonacci numbers, and Hilbert’s tenth problem. In: Workshop on Fibonacci words
Plandowski W (2004) Satisfiability of word equations with constants is in pspace. J ACM 51(3):483–496
Plandowski W (2006) An efficient algorithm for solving word equations. In: Proceedings of the 38th annual ACM symposium on theory of computing, STOC ’06, pp 467–476
Qu X, Robinson B (2011) A case study of concolic testing tools and their limitations. In: 2011 international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society, pp 117–126
Redelinghuys G, Visser W, Geldenhuys J (2012) Symbolic execution of programs with strings. In: Proceedings of the South African Institute for computer scientists and information technologists conference, SAICSIT ’12, pp 139–148
Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for javascript. In: Proceedings of the 2010 IEEE symposium on security and privacy, SP ’10, pp 513–528
Schulz K (1992) Makanin’s algorithm for word equations-two improvements and a generalization. In: Schulz K (ed) Word equations and related topics, Lecture Notes in Computer Science, vol 572. Springer, Berlin, pp 85–150
Tateishi T, Pistoia M, Tripp O (2013) Path- and index-sensitive string analysis based on monadic second-order logic. ACM Trans Softw Eng Methodol 22(4):33:1–33:33
Trinh M-T, Chu D-H, Jaffar J (2014) S3: a symbolic string solver for vulnerability detection in web applications. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, CCS ’14, pp 1232–1243
Xie X, Liu Y, Le W, Li X, Chen H (2015) S-looper: automatic summarization for multipath string loops. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 2015. ACM, New York, pp 188–198
Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for php. In: Proceedings of the 16th international conference on tools and algorithms for the construction and analysis of systems, TACAS’10, pp 154–157
Yu F, Bultan T, Ibarra OH (2009) Symbolic string verification: combining string analysis and size analysis. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems, TACAS ’09, pp 322–336
Zheng Y, Ganesh V, Subramanian S, Tripp O, Dolby J, Zhang X (2015) Effective search-space pruning for solvers of string equations, regular expressions and length constraints. Technical report. https://sites.google.com/site/z3strsolver/publications
Zheng Y, Zhang X, Ganesh V (2013) Z3-str: a z3-based string solver for web application analysis. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013, pp 114–124
Author information
Authors and Affiliations
Corresponding author
Additional information
O. Tripp: The research leading to this manuscript, and earlier versions of the manuscript, were completed while the author was at IBM Research.
Rights and permissions
About this article
Cite this article
Zheng, Y., Ganesh, V., Subramanian, S. et al. Z3str2: an efficient solver for strings, regular expressions, and length constraints. Form Methods Syst Des 50, 249–288 (2017). https://doi.org/10.1007/s10703-016-0263-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10703-016-0263-6