skip to main content
10.1145/3314221.3314651acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Parser-directed fuzzing

Published:08 June 2019Publication History

ABSTRACT

To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.

Skip Supplemental Material Section

Supplemental Material

p548-mathis.webm

webm

84.7 MB

References

  1. Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 95–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2012. A Taint Based Approach for Smart Fuzzing. In International Conference on Software Testing, Verification and Validation . IEEE Computer Society, Washington, DC, USA, 818–825. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ben Hoyt and contributors. 2018. inih - Simple .INI file parser in C, good for embedded systems. https://github.com/benhoyt/inih . Accessed: 2018-10-25.Google ScholarGoogle Scholar
  4. D. L. Bird and C. U. Munoz. 1983. Automatic Generation of Random Self-checking Test Cases. IBM Systems Journal 22, 3 (Sept. 1983), 229– 245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.. In USENIX conference on Operating systems design and implementation, Vol. 8. 209–224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cesanta Software. 2018. Embedded JavaScript engine for C/C++ https: //mongoose-os.com . https://github.com/cesanta/mjs . Accessed: 2018-06-21.Google ScholarGoogle Scholar
  7. Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy . IEEE, 380–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy. http: //arxiv.org/abs/1803.01307Google ScholarGoogle ScholarCross RefCross Ref
  9. Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis . ACM, 95–105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dave Gamble and contributors. 2018. cJSON - Ultralightweight JSON parser in ANSI C. https://github.com/DaveGamble/cJSON . Accessed: 2018-10-25.Google ScholarGoogle Scholar
  11. Will Drewry and Tavis Ormandy. 2007. Flayer: Exposing Application Internals. In USENIX Workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkeley, CA, USA, Article 1, 9 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based Directed Whitebox Fuzzing. In International Conference on Software Engineering (ICSE ’09) . IEEE Computer Society, Washington, DC, USA, 474–484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1, Article 20 (Jan. 2012), 20:20–20:27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, Vol. 8. 151–166.Google ScholarGoogle Scholar
  15. Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In IEEE/ACM Automated Software Engineering . IEEE Press, 50–59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS ’17) . ACM, New York, NY, USA, 2345–2358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. V. Hanford. 1970. Automatic Generation of Test Cases. IBM Systems Journal 9, 4 (Dec. 1970), 242–257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments.. In USENIX Conference on Security Symposium. 445– 458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 720–725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. JamesRamm and contributors. 2018. csv_parser - C library for parsing CSV files. https://github.com/JamesRamm/csv_parser . Accessed: 2018-10-25.Google ScholarGoogle Scholar
  21. Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted ControlFlow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011 .Google ScholarGoogle Scholar
  22. Kartik Talwar. 2018. Tiny-C Compiler. https://gist.github.com/ KartikTalwar/3095780 . Accessed: 2018-10-25.Google ScholarGoogle Scholar
  23. Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: program-state based binary fuzzing. In ACM SIGSOFT Symposium on The Foundations of Software Engineering . ACM, 627–637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rupak Majumdar and Koushik Sen. 2007. Hybrid Concolic Testing. In International Conference on Software Engineering (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 416–426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. In Workshop of Parallel and Distributed Debugging . Academic Medicine, pages ix–xxi,.Google ScholarGoogle Scholar
  26. Charlie Miller, Zachary NJ Peterson, et al. 2007. Analysis of mutation and generation-based fuzzing . Technical Report. Independent Security Evaluators.Google ScholarGoogle Scholar
  27. Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Network and Distributed System Security Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  28. Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 155–165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution.. In Network and Distributed System Security Symposium, Vol. 16. 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  30. Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning. 2008. Experiences with Model Inference Assisted Fuzzing. In USENIX Workshop on Offensive Technologies (WOOT’08) . USENIX Association, Berkeley, CA, USA, Article 2, 6 pages.Google ScholarGoogle Scholar
  31. Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Datadriven seed generation for fuzzing. In IEEE Symposium on Security and Privacy . IEEE, 579–594.Google ScholarGoogle Scholar
  32. Wikipedia. 2018. List of File Formats. https://en.wikipedia.org/wiki/ List_of_file_formats . Accessed: 2018-11-14.Google ScholarGoogle Scholar
  33. Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.Google ScholarGoogle ScholarCross RefCross Ref
  34. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283–294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Conference on Security Symposium. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michal Zalewski. 2018. American Fuzzy Lop. http://lcamtuf.coredump. cx/afl/ . Accessed: 2018-01-28.Google ScholarGoogle Scholar

Index Terms

  1. Parser-directed fuzzing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2019
      1162 pages
      ISBN:9781450367127
      DOI:10.1145/3314221

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 June 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate406of2,067submissions,20%

      Upcoming Conference

      PLDI '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader