ABSTRACT
To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.
Supplemental Material
- Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 95–110. Google ScholarDigital Library
- Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2012. A Taint Based Approach for Smart Fuzzing. In International Conference on Software Testing, Verification and Validation . IEEE Computer Society, Washington, DC, USA, 818–825. Google ScholarDigital Library
- Ben Hoyt and contributors. 2018. inih - Simple .INI file parser in C, good for embedded systems. https://github.com/benhoyt/inih . Accessed: 2018-10-25.Google Scholar
- D. L. Bird and C. U. Munoz. 1983. Automatic Generation of Random Self-checking Test Cases. IBM Systems Journal 22, 3 (Sept. 1983), 229– 245. Google ScholarDigital Library
- Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.. In USENIX conference on Operating systems design and implementation, Vol. 8. 209–224. Google ScholarDigital Library
- Cesanta Software. 2018. Embedded JavaScript engine for C/C++ https: //mongoose-os.com . https://github.com/cesanta/mjs . Accessed: 2018-06-21.Google Scholar
- Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy . IEEE, 380–394. Google ScholarDigital Library
- Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy. http: //arxiv.org/abs/1803.01307Google ScholarCross Ref
- Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis . ACM, 95–105. Google ScholarDigital Library
- Dave Gamble and contributors. 2018. cJSON - Ultralightweight JSON parser in ANSI C. https://github.com/DaveGamble/cJSON . Accessed: 2018-10-25.Google Scholar
- Will Drewry and Tavis Ormandy. 2007. Flayer: Exposing Application Internals. In USENIX Workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkeley, CA, USA, Article 1, 9 pages. Google ScholarDigital Library
- Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based Directed Whitebox Fuzzing. In International Conference on Software Engineering (ICSE ’09) . IEEE Computer Society, Washington, DC, USA, 474–484. Google ScholarDigital Library
- Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1, Article 20 (Jan. 2012), 20:20–20:27 pages. Google ScholarDigital Library
- Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, Vol. 8. 151–166.Google Scholar
- Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In IEEE/ACM Automated Software Engineering . IEEE Press, 50–59. Google ScholarDigital Library
- HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS ’17) . ACM, New York, NY, USA, 2345–2358. Google ScholarDigital Library
- K. V. Hanford. 1970. Automatic Generation of Test Cases. IBM Systems Journal 9, 4 (Dec. 1970), 242–257. Google ScholarDigital Library
- Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments.. In USENIX Conference on Security Symposium. 445– 458. Google ScholarDigital Library
- Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 720–725. Google ScholarDigital Library
- JamesRamm and contributors. 2018. csv_parser - C library for parsing CSV files. https://github.com/JamesRamm/csv_parser . Accessed: 2018-10-25.Google Scholar
- Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted ControlFlow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011 .Google Scholar
- Kartik Talwar. 2018. Tiny-C Compiler. https://gist.github.com/ KartikTalwar/3095780 . Accessed: 2018-10-25.Google Scholar
- Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: program-state based binary fuzzing. In ACM SIGSOFT Symposium on The Foundations of Software Engineering . ACM, 627–637. Google ScholarDigital Library
- Rupak Majumdar and Koushik Sen. 2007. Hybrid Concolic Testing. In International Conference on Software Engineering (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 416–426. Google ScholarDigital Library
- Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. In Workshop of Parallel and Distributed Debugging . Academic Medicine, pages ix–xxi,.Google Scholar
- Charlie Miller, Zachary NJ Peterson, et al. 2007. Analysis of mutation and generation-based fuzzing . Technical Report. Independent Security Evaluators.Google Scholar
- Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Network and Distributed System Security Symposium.Google ScholarCross Ref
- Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 155–165. Google ScholarDigital Library
- Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution.. In Network and Distributed System Security Symposium, Vol. 16. 1–16.Google ScholarCross Ref
- Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning. 2008. Experiences with Model Inference Assisted Fuzzing. In USENIX Workshop on Offensive Technologies (WOOT’08) . USENIX Association, Berkeley, CA, USA, Article 2, 6 pages.Google Scholar
- Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Datadriven seed generation for fuzzing. In IEEE Symposium on Security and Privacy . IEEE, 579–594.Google Scholar
- Wikipedia. 2018. List of File Formats. https://en.wikipedia.org/wiki/ List_of_file_formats . Accessed: 2018-11-14.Google Scholar
- Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.Google ScholarCross Ref
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283–294. Google ScholarDigital Library
- Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Conference on Security Symposium. USENIX Association. Google ScholarDigital Library
- Michal Zalewski. 2018. American Fuzzy Lop. http://lcamtuf.coredump. cx/afl/ . Accessed: 2018-01-28.Google Scholar
Index Terms
- Parser-directed fuzzing
Recommendations
Online Model-Based Behavioral Fuzzing
ICSTW '13: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation WorkshopsFuzz testing or fuzzing is interface robustness testing by stressing the interface of a system under test (SUT) with invalid input data. It aims at finding security-relevant weaknesses in the implementation that may result in a crash of the system-under-...
Fine-Grained Coverage-Based Fuzzing
Fuzzing is a popular software testing method that discovers bugs by massively feeding target applications with automatically generated inputs. Many state-of-art fuzzers use branch coverage as a feedback metric to guide the fuzzing process. The fuzzer ...
Guiding Greybox Fuzzing with Mutation Testing
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisGreybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage ...
Comments