Skip to main content

Static Analysis of String Encoders and Decoders

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7737))

Abstract

There has been significant interest in static analysis of programs that manipulate strings, in particular in the context of web security. Many types of security vulnerabilities are exposed through flaws in programs such as string encoders, decoders, and sanitizers. Recent work has focused on combining automata and satisfiability modulo theories techniques to address security issues in those programs. These techniques scale to larger alphabets such as Unicode, that is a de facto character encoding standard used in web software.

One approach has been to use character predicates to generalize finite state transducers. This technique has made it possible to perform precise analysis of a large class of typical sanitization routines. However, it has not been able to cope well with decoders, that often require to read more than one character at a time. In order to overcome this limitation we introduce a conservative generalization of Symbolic Finite Transducers (SFTs) called Extended Symbolic Finite Transducers (ESFTs) that incorporates the notion of a bounded lookahead. We demonstrate the advantage ESFTs on analyzing programs for which previous approaches did not scale.

In our evaluation we use a UTF-16 to UTF-8 translator (utf8encoder) and a UTF-8 to UTF-16 translator (utf8decoder). We show, among other properties, that utf8encoder and utf8decoder are functionally correct.

This work was done during an internship at Microsoft Research and this research was partially supported by NSF Expeditions in Computing award CCF 1138996.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alur, R., Cerný, P.: Streaming transducers for algorithmic verification of single-pass list-processing programs. In: POPL 2011, pp. 599–610. ACM (2011)

    Google Scholar 

  2. Bek, http://research.microsoft.com/bek

  3. Bjørner, N., Ganesh, V., Michel, R., Veanes, M.: An SMT-LIB format for sequences and regular expressions. In: Fontaine, P., Goel, A. (eds.) SMT 2012, pp. 76–86 (2012)

    Google Scholar 

  4. Bjørner, N., Tillmann, N., Voronkov, A.: Path Feasibility Analysis for String-Manipulating Programs. In: Kowalewski, S., Philippou, A. (eds.) TACAS 2009. LNCS, vol. 5505, pp. 307–321. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Christensen, A.S., Møller, A., Schwartzbach, M.I.: Precise Analysis of String Expressions. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 1–18. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. de Moura, L., Bjørner, N.S.: Z3: An Efficient SMT Solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Fülöp, Z., Vogler, H.: Syntax-Directed Semantics: Formal Models Based on Tree Transducers. EATCS. Springer (1998)

    Google Scholar 

  8. Godefroid, P.: Compositional dynamic test generation. In: POPL 2007, pp. 47–54 (2007)

    Google Scholar 

  9. Hooimeijer, P., Livshits, B., Molnar, D., Saxena, P., Veanes, M.: Fast and precise sanitizer analysis with Bek. In: Proceedings of the USENIX Security Symposium (August 2011)

    Google Scholar 

  10. Hooimeijer, P., Veanes, M.: An Evaluation of Automata Algorithms for String Analysis. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 248–262. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Kaminski, M., Francez, N.: Finite-memory automata. TCS 134(2), 329–363 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  12. Livshits, B., Nori, A.V., Rajamani, S.K., Banerjee, A.: Merlin: specification inference for explicit information flow problems. In: PLDI 2009, pp. 75–86. ACM (2009)

    Google Scholar 

  13. Minamide, Y.: Static approximation of dynamically generated web pages. In: WWW 2005: Proceedings of the 14th International Conference on the World Wide Web, pp. 432–441 (2005)

    Google Scholar 

  14. NVD, http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2008-2938

  15. OWASP. Double encoding, https://www.owasp.org/index.php/Double_Encoding

  16. SANS. Malware faq, http://www.sans.org/security-resources/malwarefaq/w-nt-unicode.php

  17. Saxena, P., Akhawe, D., Hanna, S., Mao, F., McCamant, S., Song, D.: A symbolic execution framework for javascript. Technical Report UCB/EECS-2010-26 (March 2010)

    Google Scholar 

  18. Segoufin, L.: Automata and Logics for Words and Trees over an Infinite Alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  19. Veanes, M., de Halleux, P., Tillmann, N.: Rex: Symbolic Regular Expression Explorer. In: ICST 2010, pp. 498–507. IEEE (2010)

    Google Scholar 

  20. Veanes, M., Hooimeijer, P., Livshits, B., Molnar, D., Bjorner, N.: Symbolic finite state transducers: Algorithms and applications. In: POPL 2012, pp. 137–150 (2012)

    Google Scholar 

  21. Veanes, M., Molnar, D., Mytkowicz, T., Livshits, B.: Data-parallel string-manipulating programs. Technical Report MSR-TR-2012-72, Microsoft Research (2012)

    Google Scholar 

  22. Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 41–110. Springer (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

D’Antoni, L., Veanes, M. (2013). Static Analysis of String Encoders and Decoders. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds) Verification, Model Checking, and Abstract Interpretation. VMCAI 2013. Lecture Notes in Computer Science, vol 7737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35873-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35873-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35872-2

  • Online ISBN: 978-3-642-35873-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics