Skip to main content
Log in

Automata-based symbolic string analysis for vulnerability detection

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript

Abstract

Verifying string manipulating programs is a crucial problem in computer security. String operations are used extensively within web applications to manipulate user input, and their erroneous use is the most common cause of security vulnerabilities in web applications. We present an automata-based approach for symbolic analysis of string manipulating programs. We use deterministic finite automata (DFAs) to represent possible values of string variables. Using forward reachability analysis we compute an over-approximation of all possible values that string variables can take at each program point. Intersecting these with a given attack pattern yields the potential attack strings if the program is vulnerable. Based on the presented techniques, we have implemented Stranger, an automata-based string analysis tool for detecting string-related security vulnerabilities in PHP applications. We evaluated Stranger on several open-source Web applications including one with 350,000+ lines of code. Stranger is able to detect known/unknown vulnerabilities, and, after inserting proper sanitization routines, prove the absence of vulnerabilities with respect to given attack patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Alkhalaf M, Bultan T, Gallegos JL (2012) Verifying client-side input validation functions using string analysis. In: ICSE, pp 947–957

    Google Scholar 

  2. Balzarotti D, Cova M, Felmetsger V, Jovanovic N, Kruegel C, Kirda E, Vigna G (2008) Saner: composing static and dynamic analysis to validate sanitization in web applications. In: S&P, pp 387–401

    Google Scholar 

  3. Bartzis C, Bultan T (2003) Efficient symbolic representations for arithmetic constraints in verification. Int J Found Comput Sci 14(4):605–624

    Article  MATH  MathSciNet  Google Scholar 

  4. Bartzis C, Bultan T (2004) Widening arithmetic automata. In: CAV, pp 321–333

    Google Scholar 

  5. Biehl M, Klarlund N, Rauhe T (1997) Algorithms for guided tree automata. In: WIA, pp 6–25

    Google Scholar 

  6. Bjørner N, Tillmann N, Voronkov A (2009) Path feasibility analysis for string-manipulating programs. In: TACAS, pp 307–321

    Google Scholar 

  7. Book R, Even S, Greibach S, Ott G (1971) Ambiguity in graphs and expressions. IEEE Trans Comput C-20(2):149–153

    Article  MathSciNet  Google Scholar 

  8. Bouajjani A, Habermehl P, Vojnar T (2004) Abstract regular model checking. In: CAV, pp 372–386

    Google Scholar 

  9. Bouajjani A, Jonsson B, Nilsson M, Touili T (2000) Regular model checking. In: CAV, pp 403–418

    Google Scholar 

  10. BRICS. The MONA project. http://www.brics.dk/mona/

  11. Choi T-H, Lee O, Kim H, Doh K-G (2006) A practical string analyzer by the widening approach. In: APLAS, pp 374–388

    Google Scholar 

  12. Christensen AS, Møller A, Schwartzbach MI (2003) Precise analysis of string expressions. In: SAS, pp 1–18

    Google Scholar 

  13. Christodorescu M, Kidd N, Goh W-H (2005) String analysis for x86 binaries. In: PASTE, pp 88–95

    Google Scholar 

  14. Fu X, Lu X, Peltsverger B, Chen S, Qian K, Tao L (2007) A static analysis framework for detecting SQL injection vulnerabilities. In: COMPSAC, pp 87–96

    Google Scholar 

  15. Gould C, Su Z, Devanbu P (2004) Static checking of dynamically generated queries in database applications. In: ICSE, pp 645–654

    Google Scholar 

  16. Hooimeijer P, Livshits B, Molnar D, Saxena P, Veanes M (2011) Fast and precise sanitizer analysis with BEK. In: SEC, p 1

    Google Scholar 

  17. Hooimeijer P, Weimer W (2009) A decision procedure for subset constraints over regular languages. In: PLDI, pp 188–198

    Chapter  Google Scholar 

  18. Hooimeijer P, Weimer W (2012) Strsolve: solving string constraints lazily. Autom Softw Eng 19(4):531–559

    Article  Google Scholar 

  19. Jovanovic N, Krügel C, Kirda E (2006) Pixy: a static analysis tool for detecting web application vulnerabilities (short paper). In: S&P, pp 258–263

    Google Scholar 

  20. Kiezun A, Ganesh V, Guo PJ, Hooimeijer P, Ernst MD (2009) Hampi: a solver for string constraints. In: ISSTA, pp 105–116

    Chapter  Google Scholar 

  21. Kirkegaard C, Møller A, Schwartzbach MI (2004) Static analysis of XML transformations in Java. IEEE Trans Softw Eng 30(3):181–192

    Article  Google Scholar 

  22. Klarlund N, Møller A, Schwartzbach MI (2002) MONA implementation secrets. Int J Found Comput Sci 13(4):571–586

    Article  MATH  Google Scholar 

  23. Minamide Y (2005) Static approximation of dynamically generated web pages. In: WWW, pp 432–441

    Google Scholar 

  24. OWASP. Top 10 2007. https://www.owasp.org/index.php/Top_10_2007

  25. OWASP. Top 10 2010. https://www.owasp.org/index.php/Top_10_2010-Main

  26. OWASP. Top 10 2013. https://www.owasp.org/index.php/Top_10_2013-T10

  27. Sakuma Y, Minamide Y, Voronkov A (2012) Translating regular expression matching into transducers. J Appl Log 10(1):32–51

    MATH  MathSciNet  Google Scholar 

  28. Saxena P, Akhawe D, Hanna S, Mao F, McCamant S, Song D (2010) A symbolic execution framework for JavaScript. In: S&P, pp 513–528

    Google Scholar 

  29. Sen K, Marinov D, Agha G (2005) Cute: a concolic unit testing engine for C. In: ESEC/FSE, pp 263–272

    Chapter  Google Scholar 

  30. Shannon D, Hajra S, Lee A, Zhan D, Khurshid S (2007) Abstracting symbolic execution with string analysis. In: TAICPART-MUTATION, pp 13–22

    Google Scholar 

  31. Sourceforge. Open sources. http://sourceforge.net

  32. Tateishi T, Pistoia M, Tripp O (2011) Path- and index-sensitive string analysis based on monadic second-order logic. In: ISSTA, pp 166–176

    Google Scholar 

  33. van Noord G. FSA utilities toolbox. http://odur.let.rug.nl/~vannoord/Fsa/

  34. Veanes M, Bjørner N (2012) Symbolic automata: the toolkit. In: TACAS, pp 472–477

    Google Scholar 

  35. Veanes M, Hooimeijer P, Livshits B, Molnar D, Bjorner N (2012) Symbolic finite state transducers: algorithms and applications. In: POPL, pp 137–150

    Google Scholar 

  36. Wassermann G, Su Z (2007) Sound and precise analysis of web applications for injection vulnerabilities. In: PLDI, pp 32–41

    Google Scholar 

  37. Wassermann G, Su Z (2008) Static detection of cross-site scripting vulnerabilities. In: ICSE, pp 171–180

    Chapter  Google Scholar 

  38. Wassermann G, Yu D, Chander A, Dhurjati D, Inamura H, Su Z (2008) Dynamic test input generation for web applications. In: ISSTA, pp 249–260

    Chapter  Google Scholar 

  39. Xie Y, Aiken A (2006) Static detection of security vulnerabilities in scripting languages. In: USENIX-SS, p 13

    Google Scholar 

  40. Yu F, Alkhalaf M, Bultan T (2010) Stranger: an automata-based string analysis tool for PHP. In: TACAS, pp 154–157

    Google Scholar 

  41. Yu F, Alkhalaf M, Bultan T (2011) Patching vulnerabilities with sanitization synthesis. In: ICSE, pp 251–260

    Google Scholar 

  42. Yu F, Bultan T, Cova M, Ibarra OH (2008) Symbolic string verification: an automata-based approach. In: SPIN, pp 306–324

    Google Scholar 

  43. Yu F, Bultan T, Hardekopf B (2011) String abstractions for string verification. In: SPIN, pp 20–37

    Google Scholar 

  44. Yu F, Bultan T, Ibarra OH (2011) Relational string verification using multi-track automata. Int J Found Comput Sci 22(8):1909–1924

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, F., Alkhalaf, M., Bultan, T. et al. Automata-based symbolic string analysis for vulnerability detection. Form Methods Syst Des 44, 44–70 (2014). https://doi.org/10.1007/s10703-013-0189-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10703-013-0189-1

Keywords

Navigation