Abstract
An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a diverse set of data types that includes character strings. Until recently, satisfiability solvers for strings were standalone tools that could reason only about fairly restricted fragments of the theory of strings and regular expressions (e.g., strings of bounded lengths). These solvers were based on reductions to satisfiability problems over other data types such as bit vectors or to automata decision problems. We present a set of algebraic techniques for solving constraints over a rich theory of unbounded strings natively, without reduction to other problems. These techniques can be used to integrate string reasoning into general, multi-theory SMT solvers based on the common DPLL(T) architecture. We have implemented them in our SMT solver cvc4, expanding its already large set of built-in theories to include a theory of strings with concatenation, length, and membership in regular languages. This implementation makes cvc4 the first solver able to accept a rich set of mixed constraints over strings, integers, reals, arrays and algebraic datatypes. Our initial experimental results show that, in addition, on pure string problems cvc4 is highly competitive with specialized string solvers accepting a comparable input language.








Similar content being viewed by others
Notes
Personal communication. norn was not publicly available at the time of this writing.
We do not specify those additional symbols here because solving membership constraints is not the focus of this paper.
This difference is not substantial if the arithmetic solver treats \((\mathsf {len}\,x)\) like an integer variable.
Such equations can always be added as needed using fresh variables without changing the satisfiability of the original problem.
If x occurs only in \(A_0\), necessarily in a term of the form \(\mathsf {len}\,x\), the whole term can be replaced by a fresh arithmetic variable.
F-Loop introduces regular expression which are not the focus of this work.
Refer back to Fig. 7 for a definition of \( len _{b}\).
Observe that \(N\,e\) is defined by point (ii) of Definition 6.
It is at most recomputed from scratch after an application of Reset.
cvc4 is publicly available at http://cvc4.cs.nyu.edu/.
For many benchmarks, norn, which runs on the Java virtual machine, crashed when executed on StarExec because of insufficient resources in the JVM. Also, for satisfiable problems, norn returns solutions in a non-standard format, which made it difficult for us to validate those models.
Both the solvers and the results are available, by logging in as a guest user, at https://www.starexec.org/starexec/secure/details/job.jsp?id=6875 (cvc4) and https://www.starexec.org/starexec/secure/details/job.jsp?id=6891 (z3-str and s3).
The Kaluza documentation does not specify the meaning of the function when its second argument, an integer, is greater than 0.
The SMT-LIB 2 standard does not include a theory of strings yet although there are plans to do so. cvc4 ’s extension is documented at http://cvc4.cs.nyu.edu/wiki/Strings.
References
Abdulla PA, Atig MF, Chen YF, Holik L, Rezine A, Rummer P, Stenman J (2014) String constraints for verification. In: Biere A, Bloem R (eds) Proceedings of the 26th international conference on computer aided verification. Lecture notes in computer science, vol. 8559. Springer, Berlin
Barrett C, Nieuwenhuis R, Oliveras A, Tinelli C (2006) Splitting on demand in SAT modulo theories. In: Proceedings of LPAR’06. Lecture notes in computer science, vol. 4246. Springer, Berlin, pp 512–526
Barrett C, Sebastiani R, Seshia S, Tinelli C (2009) Satisfiability modulo theories. In: Biere A, Heule MJH, van Maaren H, Walsh T (eds) Handbook of satisfiability, vol 185, chap 26. IOS Press, Amsterdam, pp 825–885
Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: Proceedings of the 15th international conference on tools and algorithms for the construction and analysis of systems. Lecture notes in computer science. Springer, pp 307–321
Brumley D, Caballero J, Liang Z, Newsome J (2007) Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation. In: Proceedings of the 16th USENIX security symposium, Boston, MA, USA, 6–10 August 2007
Brumley D, Wang H, Jha S, Song DX (2007) Creating vulnerability signatures using weakest preconditions. In: 20th IEEE computer security foundations symposium, CSF 2007, 6–8 July 2007, Venice, Italy, pp 311–325
Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: Proceedings of the 10th international conference on static analysis. Lecture notes in computer science. Springer, Berlin, pp 1–18
De Moura L, Bjørner N (2008) Z3: an efficient SMT solver. In: Proceedings of the theory and practice of software, 14th international conference on tools and algorithms for the construction and analysis of systems. Lecture notes in computer science. Springer, Berlin, pp 337–340
Egele M, Kruegel C, Kirda E, Yin H, Song D (2007) Dynamic spyware analysis. In: 2007 USENIX annual technical conference on proceedings of the USENIX annual technical conference, ATC’07. USENIX Association, Berkeley, CA, USA, pp 18:1–18:14
Fu X, Li C (2010) A string constraint solver for detecting web application vulnerability. In: Proceedings of the 22nd international conference on software engineering and knowledge engineering, SEKE’2010. Knowledge Systems Institute Graduate School, Skokie
Ganesh V, Minnes M, Solar-Lezama A, Rinard M (2013) Word equations with length constraints: what’s decidable? In: Proceedings of the 8th international conference on hardware and software: verification and testing, HVC’12. Springer, Berlin, pp 209–226
Ghosh I, Shafiei N, Li G, Chiang W (2013) JST: an automatic test generation tool for industrial Java applications with strings. In: Proceedings of the 2013 international conference on software engineering, ICSE’13. IEEE Press, Piscataway, pp. 992–1001
Hooimeijer P, Veanes M (2011) An evaluation of automata algorithms for string analysis. In: Proceedings of the 12th international conference on verification, model checking, and abstract interpretation. Springer, Berlin, pp 248–262
Hooimeijer P, Weimer W (2009) A decision procedure for subset constraints over regular languages. In: Proceedings of the 2009 ACM SIGPLAN conference on programming language design and implementation. ACM, Dublin, pp 188–198
Hooimeijer P, Weimer W (2010) Solving string constraints lazily. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, New York, pp 377–386
Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) HAMPI: a solver for string constraints. In: Proceedings of the eighteenth international symposium on Software testing and analysis. ACM, New York, pp 105–116
Li G, Ghosh I (2013) PASS: string solving with parameterized array and interval automaton. In: Bertacco V, Legay A (eds) Hardware and software: verification and testing. Lecture notes in computer science, vol 8244. Springer, Berlin, pp 15–31
Liang T, Reynolds A, Tinelli C, Barrett C, Deters M (2014) A DPLL(T) theory solver for a theory of strings and regular expressions. In: Biere A, Bloem R (eds) Proceedings of the 26th international conference on computer aided verification. Lecture notes in computer science, vol 8559. Springer, Berlin
Liang T, Tsiskaridze N, Reynolds A, Tinelli C, Barrett C (2015) A decision procedure for regular membership and length constraints over unbounded strings. In: Frontiers of combining systems. Springer, Berlin, pp 135–150
Makanin GS (1977) The problem of solvability of equations in a free semigroup. English transl. in Math USSR Sbornik, vol 32, pp 147–236
Namjoshi KS, Narlikar GJ (2010) Robust and fast pattern matching for intrusion detection. In: INFOCOM 2010. 29th IEEE international conference on computer communications, joint conference of the IEEE computer and communications societies, 15–19 March 2010, San Diego, CA, USA, pp 740–748
Nelson G, Oppen DC (1979) Simplification by cooperating decision procedures. ACM Trans Program Lang Syst 1(2):245–257
Nieuwenhuis R, Oliveras A, Tinelli C (2006) Solving SAT and SAT Modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T). J ACM 53(6):937–977
Perrin D (1989) Equations in words. In: Ait-Kaci H, Nivat M (eds) Resolution of equations in algebraic structures, vol 2. Academic Press, Cambridge, pp 275–298
Plandowski W (2004) Satisfiability of word equations with constants is in PSPACE. J ACM 51(3):483–496
Saxena P, Akhawe D (2010) Kaluza web site. http://webblaze.cs.berkeley.edu/2010/kaluza/
Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for JavaScript. In: Proceedings of the 2010 IEEE symposium on security and privacy. IEEE Computer Society, pp 513–528
Stump A, Sutcliffe G, Tinelli C (2014) Starexec: a cross-community infrastructure for logic solving. In: Demri S, Kapur D, Weidenbach C (eds) Proceedings of the 7th international joint conference on automated reasoning. Lecture notes in artificial intelligence. Springer, Berlin
Tillmann N, Halleux J (2008) Pex—white box test generation for .NET. In Beckert B, Hähnle R (eds) Tests and proofs. Lecture notes in computer science, vol 4966. Springer, Berlin, pp 134–153
Tinelli C, Harandi MT (1996) A new correctness proof of the Nelson–Oppen combination procedure. In: Baader F, Schulz KU (eds) Frontiers of combining systems: proceedings of the 1st international workshop (Munich, Germany). Applied logic. Kluwer Academic Publishers, Dordrecht, pp 103–120
Trinh MT, Chu DH, Jaffar J (2014) S3: a symbolic string solver for vulnerability detection in web applications. In: Yung M, Li N (eds) Proceedings of the 21st ACM conference on computer and communications security. ACM, New York, pp 1232–1243
Veanes M (2013) Applications of symbolic finite automata. In: Proceedings of the 18th international conference on implementation and application of automata, CIAA’13. Springer, Berlin, pp 16–23
Veanes M, Bjørner N, De Moura L (2010) Symbolic automata constraint solving. In: Proceedings of the 17th international conference on logic for programming, artificial intelligence, and reasoning. Lecture notes in computer science. Springer, Berlin, pp 640–654
Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for php. In: Esparza J, Majumdar R (eds) Tools and algorithms for the construction and analysis of systems. Lecture notes in computer science, vol 6015. Springer, Berlin, pp 154–157
Zheng Y, Zhang X, Ganesh V (2013) Z3-str: a Z3-based string solver for web application analysis. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, ESEC/FSE 2013. ACM, New York, pp 114–124
Acknowledgments
We thank the developers of z3-str for their technical support in using their tool and several clarifications on it, as well as for their prompt response to our inquiries. We also express our gratitude to the developers of the StarExec service for their assistance and for implementing additional features we requested while running our experimental evaluation on the service. Finally, we thank the anonymous reviewers for their supportive comments and for their valuable suggestions on improving the paper’s presentation. The work described here was partially funded by NSF Grants #1228765 and #1228768. The second author was also supported in part by the European Research Council (ERC) Project Implicit Programming.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is dedicated to the memory of Morgan Deters who died unexpectedly in January 2015.
Rights and permissions
About this article
Cite this article
Liang, T., Reynolds, A., Tsiskaridze, N. et al. An efficient SMT solver for string constraints. Form Methods Syst Des 48, 206–234 (2016). https://doi.org/10.1007/s10703-016-0247-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10703-016-0247-6