research-article

Parser-directed fuzzing

Authors:
Björn Mathis

CISPA, Germany

CISPA, Germany
View Profile

,
Rahul Gopinath

CISPA, Germany

CISPA, Germany
View Profile

,
Michaël Mera

CISPA, Germany

CISPA, Germany
View Profile

,
Alexander Kampmann

CISPA, Germany

CISPA, Germany
View Profile

,
Matthias Höschele

CISPA, Germany

CISPA, Germany
View Profile

,
Andreas Zeller

CISPA, Germany

CISPA, Germany
View Profile

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationJune 2019Pages 548–560https://doi.org/10.1145/3314221.3314651

Published:08 June 2019Publication History

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 548–560

ABSTRACT

To be effective, software test generation needs to well cover the space of possible inputs. Traditional fuzzing generates large numbers of random inputs, which however are unlikely to contain keywords and other specific inputs of non-trivial input languages. Constraint-based test generation solves conditions of paths leading to uncovered code, but fails on programs with complex input conditions because of path explosion. In this paper, we present a test generation technique specifically directed at input parsers. We systematically produce inputs for the parser and track comparisons made; after every rejection, we satisfy the comparisons leading to rejection. This approach effectively covers the input space: Evaluated on five subjects, from CSV files to JavaScript, our pFuzzer prototype covers more tokens than both random-based and constraint-based approaches, while requiring no symbolic analysis and far fewer tests than random fuzzers.

Supplemental Material

p548-mathis.webm

webm

84.7 MB

Download

References

Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. 2017. Synthesizing Program Input Grammars. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017) . ACM, New York, NY, USA, 95–110. Google ScholarDigital Library
Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. 2012. A Taint Based Approach for Smart Fuzzing. In International Conference on Software Testing, Verification and Validation . IEEE Computer Society, Washington, DC, USA, 818–825. Google ScholarDigital Library
Ben Hoyt and contributors. 2018. inih - Simple .INI file parser in C, good for embedded systems. https://github.com/benhoyt/inih . Accessed: 2018-10-25.Google Scholar
D. L. Bird and C. U. Munoz. 1983. Automatic Generation of Random Self-checking Test Cases. IBM Systems Journal 22, 3 (Sept. 1983), 229– 245. Google ScholarDigital Library
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.. In USENIX conference on Operating systems design and implementation, Vol. 8. 209–224. Google ScholarDigital Library
Cesanta Software. 2018. Embedded JavaScript engine for C/C++ https: //mongoose-os.com . https://github.com/cesanta/mjs . Accessed: 2018-06-21.Google Scholar
Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. 2012. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy . IEEE, 380–394. Google ScholarDigital Library
Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy. http: //arxiv.org/abs/1803.01307Google ScholarCross Ref
Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis . ACM, 95–105. Google ScholarDigital Library
Dave Gamble and contributors. 2018. cJSON - Ultralightweight JSON parser in ANSI C. https://github.com/DaveGamble/cJSON . Accessed: 2018-10-25.Google Scholar
Will Drewry and Tavis Ormandy. 2007. Flayer: Exposing Application Internals. In USENIX Workshop on Offensive Technologies (WOOT ’07). USENIX Association, Berkeley, CA, USA, Article 1, 9 pages. Google ScholarDigital Library
Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based Directed Whitebox Fuzzing. In International Conference on Software Engineering (ICSE ’09) . IEEE Computer Society, Washington, DC, USA, 474–484. Google ScholarDigital Library
Patrice Godefroid, Michael Y. Levin, and David Molnar. 2012. SAGE: Whitebox Fuzzing for Security Testing. Queue 10, 1, Article 20 (Jan. 2012), 20:20–20:27 pages. Google ScholarDigital Library
Patrice Godefroid, Michael Y Levin, David A Molnar, et al. 2008. Automated whitebox fuzz testing. In Network and Distributed System Security Symposium, Vol. 8. 151–166.Google Scholar
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&fuzz: Machine learning for input fuzzing. In IEEE/ACM Automated Software Engineering . IEEE Press, 50–59. Google ScholarDigital Library
HyungSeok Han and Sang Kil Cha. 2017. IMF: Inferred Model-based Fuzzer. In ACM SIGSAC Conference on Computer and Communications Security (CCS ’17) . ACM, New York, NY, USA, 2345–2358. Google ScholarDigital Library
K. V. Hanford. 1970. Automatic Generation of Test Cases. IBM Systems Journal 9, 4 (Dec. 1970), 242–257. Google ScholarDigital Library
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments.. In USENIX Conference on Security Symposium. 445– 458. Google ScholarDigital Library
Matthias Höschele and Andreas Zeller. 2016. Mining Input Grammars from Dynamic Taints. In IEEE/ACM Automated Software Engineering (ASE 2016) . ACM, New York, NY, USA, 720–725. Google ScholarDigital Library
JamesRamm and contributors. 2018. csv_parser - C library for parsing CSV files. https://github.com/JamesRamm/csv_parser . Accessed: 2018-10-25.Google Scholar
Min Gyung Kang, Stephen McCamant, Pongsin Poosankam, and Dawn Song. 2011. DTA++: Dynamic Taint Analysis with Targeted ControlFlow Propagation. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2011, San Diego, California, USA, 6th February - 9th February 2011 .Google Scholar
Kartik Talwar. 2018. Tiny-C Compiler. https://gist.github.com/ KartikTalwar/3095780 . Accessed: 2018-10-25.Google Scholar
Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: program-state based binary fuzzing. In ACM SIGSOFT Symposium on The Foundations of Software Engineering . ACM, 627–637. Google ScholarDigital Library
Rupak Majumdar and Koushik Sen. 2007. Hybrid Concolic Testing. In International Conference on Software Engineering (ICSE ’07). IEEE Computer Society, Washington, DC, USA, 416–426. Google ScholarDigital Library
Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. In Workshop of Parallel and Distributed Debugging . Academic Medicine, pages ix–xxi,.Google Scholar
Charlie Miller, Zachary NJ Peterson, et al. 2007. Analysis of mutation and generation-based fuzzing . Technical Report. Independent Security Evaluators.Google Scholar
Sanjay Rawat, Vivek Jain, Ashish Kumar, Lucian Cojocar, Cristiano Giuffrida, and Herbert Bos. 2017. Vuzzer: Application-aware evolutionary fuzzing. In Network and Distributed System Security Symposium.Google ScholarCross Ref
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 155–165. Google ScholarDigital Library
Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Corbetta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2016. Driller: Augmenting Fuzzing Through Selective Symbolic Execution.. In Network and Distributed System Security Symposium, Vol. 16. 1–16.Google ScholarCross Ref
Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning. 2008. Experiences with Model Inference Assisted Fuzzing. In USENIX Workshop on Offensive Technologies (WOOT’08) . USENIX Association, Berkeley, CA, USA, Article 2, 6 pages.Google Scholar
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Datadriven seed generation for fuzzing. In IEEE Symposium on Security and Privacy . IEEE, 579–594.Google Scholar
Wikipedia. 2018. List of File Formats. https://en.wikipedia.org/wiki/ List_of_file_formats . Accessed: 2018-11-14.Google Scholar
Jingbo Yan, Yuqing Zhang, and Dingning Yang. 2013. Structurized grammar-based fuzz testing for programs with highly structured inputs. Security and Communication Networks 6, 11 (2013), 1319–1330.Google ScholarCross Ref
Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In ACM SIGPLAN Notices, Vol. 46. ACM, 283–294. Google ScholarDigital Library
Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang, and Taesoo Kim. 2018. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Conference on Security Symposium. USENIX Association. Google ScholarDigital Library
Michal Zalewski. 2018. American Fuzzy Lop. http://lcamtuf.coredump. cx/afl/ . Accessed: 2018-01-28.Google Scholar

Index Terms

Parser-directed fuzzing
1. Security and privacy
  1. Software and application security
    1. Software security engineering

Recommendations

Online Model-Based Behavioral Fuzzing
ICSTW '13: Proceedings of the 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops

Fuzz testing or fuzzing is interface robustness testing by stressing the interface of a system under test (SUT) with invalid input data. It aims at finding security-relevant weaknesses in the implementation that may result in a crash of the system-under-...
Read More
Fine-Grained Coverage-Based Fuzzing
Fuzzing is a popular software testing method that discovers bugs by massively feeding target applications with automatically generated inputs. Many state-of-art fuzzers use branch coverage as a feedback metric to guide the fuzzing process. The fuzzer ...
Read More
Guiding Greybox Fuzzing with Mutation Testing
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
General Chair:
Kathryn S. McKinley
Google, USA
,
Program Chair:
Kathleen Fisher
Tufts University, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fuzzing
parsers
security
test generation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate406of2,067submissions,20%
Upcoming Conference
PLDI '24

Sponsor:

sigplan

ACM SIGPLAN Conference on Programming Language Design and Implementation

June 24 - 28, 2024

Copenhagen , Denmark
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 873
  Total Downloads
- Downloads (Last 12 months)137
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parser-directed fuzzing

PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Online Model-Based Behavioral Fuzzing

Fine-Grained Coverage-Based Fuzzing

Guiding Greybox Fuzzing with Mutation Testing