skip to main content
10.1145/3605157.3605173acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Large Language Models for Fuzzing Parsers (Registered Report)

Published: 17 July 2023 Publication History

Abstract

Ambiguity in format specifications is a significant source of software vulnerabilities. In this paper, we propose a natural language processing (NLP) driven approach that implicitly leverages the ambiguity of format specifications to generate instances of a format for fuzzing. We employ a large language model (LLM) to recursively examine a natural language format specification to generate instances from the specification for use as strong seed examples to a mutation fuzzer. Preliminary experiments show that our method outperforms a basic mutation fuzzer, and is capable of synthesizing examples from novel handwritten formats.

References

[1]
[n. d.]. ANTLR. https://www.antlr.org Accessed: 2023-05-13
[2]
[n. d.]. Honggfuzz. https://honggfuzz.dev Accessed: 2023-05-13
[3]
2022. New and improved embedding model. https://openai.com/blog/new-and-improved-embedding-model#RyanGreene Accessed: 2023-05-12
[4]
Sameed Ali, Prashant Anantharaman, Zephyr Lucas, and Sean W. Smith. 2021. What We Have Here Is Failure to Validate: Summer of LangSec. IEEE Security & Privacy, 19, 3 (2021), 17–23. https://doi.org/10.1109/MSEC.2021.3059167
[5]
Branden Archer and Darkkey. [n. d.]. Radamsa: a general-purpose fuzzer. https://gitlab.com/akihe/radamsa Accessed: 2023-05-13
[6]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage-Based Greybox Fuzzing as Markov Chain. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16). Association for Computing Machinery, New York, NY, USA. 1032–1043. isbn:9781450341394 https://doi.org/10.1145/2976749.2978428
[7]
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. 2022. On the Opportunities and Risks of Foundation Models. arxiv:2108.07258.
[8]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[9]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde, Jared Kaplan, Harrison Edwards, Yura Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, David W. Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William H. Guss, Alex Nichol, Igor Babuschkin, S. Arun Balaji, Shantanu Jain, Andrew Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew M. Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. ArXiv, abs/2107.03374 (2021).
[10]
Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In 2018 IEEE Symposium on Security and Privacy (SP). 711–725. https://doi.org/10.1109/SP.2018.00046
[11]
Colin B. Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. arxiv:2010.03150.
[12]
Nicolas Coppik, Oliver Schwahn, and Neeraj Suri. 2019. MemFuzz: Using Memory Accesses to Guide Fuzzing. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). 48–58. https://doi.org/10.1109/ICST.2019.00015
[13]
Michael Eddington. [n. d.]. Peach Fuzzer. https://peachtech.gitlab.io/peach-fuzzer-community/ Accessed: 2023-05-14
[14]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arxiv:2002.08155.
[15]
LLVM Foundation. [n. d.]. libFuzzer – a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html Accessed: 2023-05-14
[16]
Vijay Ganesh, Tim Leek, and Martin Rinard. 2009. Taint-based directed whitebox fuzzing. In 2009 IEEE 31st International Conference on Software Engineering. 474–484. https://doi.org/10.1109/ICSE.2009.5070546
[17]
Patrice Godefroid, Adam Kiezun, and Michael Y. Levin. 2008. Grammar-Based Whitebox Fuzzing. SIGPLAN Not., 43, 6 (2008), jun, 206–215. issn:0362-1340 https://doi.org/10.1145/1379022.1375607
[18]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn&Fuzz: Machine Learning for Input Fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE ’17). IEEE Press, 50–59. isbn:9781538626849
[19]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.). 27, Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
[20]
Gustavo Grieco, Martín Ceresa, and Pablo Buiras. 2016. QuickFuzz: An Automatic Random Fuzzer for Common File Formats. SIGPLAN Not., 51, 12 (2016), sep, 13–20. issn:0362-1340 https://doi.org/10.1145/3241625.2976017
[21]
Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program synthesis. Foundations and Trends® in Programming Languages, 4, 1-2 (2017), 1–119.
[22]
Ahmad Hazimeh, Adrian Herrera, and Mathias Payer. 2020. Magma: A Ground-Truth Fuzzing Benchmark. Proc. ACM Meas. Anal. Comput. Syst., 4, 3 (2020), Article 49, Dec., 29 pages. https://doi.org/10.1145/3428334
[23]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., 9, 8 (1997), Nov., 1735–1780. issn:0899-7667 https://doi.org/10.1162/neco.1997.9.8.1735
[24]
Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin Choi. 2019. The Curious Case of Neural Text Degeneration. ArXiv, abs/1904.09751 (2019).
[25]
Srini Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2018. Mapping Language to Code in Programmatic Context. ArXiv, abs/1808.09588 (2018).
[26]
Dave Jones. [n. d.]. Trinity: Linux system call fuzzer. https://github.com/kernelslacker/trinity Accessed: 2023-05-14
[27]
Anurag Koul, Alan Fern, and Sam Greydanus. 2019. Learning Finite State Representations of Recurrent Policy Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=S1gOpsCctm
[28]
Lorenz Kuhn, Yarin Gal, and Sebastian Farquhar. 2023. CLAM: Selective Clarification for Ambiguous Questions with Generative Language Models. arxiv:2212.07769.
[29]
Tao Lei, Fan Long, Regina Barzilay, and Martin Rinard. 2013. From Natural Language Specifications to Program Input Parsers. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Sofia, Bulgaria. 1294–1303. https://aclanthology.org/P13-1127
[30]
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen. 2023. CodaMOSA: Escaping coverage plateaus in test generation with pre-trained large language models. In International Conference on Software Engineering. ser. ICSE.
[31]
Caroline Lemieux and Koushik Sen. 2018. FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE ’18). Association for Computing Machinery, New York, NY, USA. 475–485. isbn:9781450359375 https://doi.org/10.1145/3238147.3238176
[32]
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2022. What Makes Good In-Context Examples for GPT-3? In Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. Association for Computational Linguistics, Dublin, Ireland and Online. 100–114. https://doi.org/10.18653/v1/2022.deelio-1.10
[33]
Bill Marczak, John Scott-Railton, Bahr Abdul Razzak, Noura Al-Jizawi, Siena Anstis, Kristin Berdan, and Ron Deibert. [n. d.]. FORCEDENTRY: NSO Group iMessage Zero-Click Exploit Captured in the Wild. https://citizenlab.ca/2021/09/forcedentry-nso-group-imessage-zero-click-exploit-captured-in-the-wild/ Accessed: 2023-05-13
[34]
Dheeraj Mekala, Jason Wolfe, and Subhro Roy. 2022. ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models. arxiv:2212.10815.
[35]
Barton P. Miller, Lars Fredriksen, and Bryan So. 1990. An Empirical Study of the Reliability of UNIX Utilities. Commun. ACM, 33, 12 (1990), dec, 32–44. issn:0001-0782 https://doi.org/10.1145/96267.96279
[36]
MITRE. [n. d.]. CVE-2014-0160 Detail. https://nvd.nist.gov/vuln/detail/cve-2014-0160 Accessed: 2023-05-13
[37]
Nicole Nichols, Mark Raugas, Robert Jasper, and Nathan Hilliard. 2017. Faster Fuzzing: Reinitialization with Deep Neural Models. arxiv:1711.02807.
[38]
Daniela Oliveira, Marissa Rosenthal, Nicole Morin, Kuo-Chuan Yeh, Justin Cappos, and Yanyan Zhuang. 2014. It’s the Psychology Stupid: How Heuristics Explain Software Vulnerabilities and How Priming Can Illuminate Developer’s Blind Spots. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC ’14). Association for Computing Machinery, New York, NY, USA. 296–305. isbn:9781450330053 https://doi.org/10.1145/2664243.2664254
[39]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.
[40]
Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2022. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In Proceedings - 43rd IEEE Symposium on Security and Privacy, SP 2022 (Proceedings - IEEE Symposium on Security and Privacy). Institute of Electrical and Electronics Engineers Inc., 754–768. https://doi.org/10.1109/SP46214.2022.9833571
[41]
Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2016. Model-Based Whitebox Fuzzing for Program Binaries. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE ’16). Association for Computing Machinery, New York, NY, USA. 543–553. isbn:9781450338455 https://doi.org/10.1145/2970276.2970316
[42]
Mohit Rajpal, William Blum, and Rishabh Singh. 2017. Not all bytes are equal: Neural byte sieve for fuzzing. ArXiv, abs/1711.04596 (2017).
[43]
Patrick Schramowski, Cigdem Turan-Schwiewager, Nico Andersen, Constantin Rothkopf, and Kristian Kersting. 2022. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4 (2022), 03, 258–268. https://doi.org/10.1038/s42256-022-00458-8
[44]
Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, and Suman Sekhar Jana. 2018. NEUZZ: Efficient Fuzzing with Neural Program Smoothing. 2019 IEEE Symposium on Security and Privacy (SP), 803–817.
[45]
Richard Shin, Christopher Lin, Sam Thomson, Charles Chen, Subhro Roy, Emmanouil Antonios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. 2021. Constrained Language Models Yield Few-Shot Semantic Parsers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 7699–7715. https://doi.org/10.18653/v1/2021.emnlp-main.608
[46]
Siguza. [n. d.]. Psychic Paper. https://blog.siguza.net/psychicpaper/ Accessed: 2023-05-15
[47]
Sean W. Smith. 2012. Security and Cognitive Bias: Exploring the Role of the Mind. IEEE Security & Privacy, 10, 5 (2012), 75–78. https://doi.org/10.1109/MSP.2012.126
[48]
M. Tomita. 1982. Dynamic Construction of Finite Automata from examples using Hill-climbing. In Proceedings of the Fourth Annual Conference of the Cognitive Science Society. Ann Arbor, Michigan. 105–108.
[49]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-Driven Seed Generation for Fuzzing. In 2017 IEEE Symposium on Security and Privacy (SP). 579–594. https://doi.org/10.1109/SP.2017.23
[50]
Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2018. Superion: Grammar-Aware Greybox Fuzzing. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 724–735.
[51]
Gail Weiss, Yoav Goldberg, and Eran Yahav. 2018. Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples. In Proceedings of the 35th International Conference on Machine Learning, Jennifer Dy and Andreas Krause (Eds.) (Proceedings of Machine Learning Research, Vol. 80). PMLR, 5247–5256. https://proceedings.mlr.press/v80/weiss18a.html
[52]
Michal Zalewski. [n. d.]. Technical “whitepaper” for afl-fuzz. http://lcamtuf.coredump.cx/afl/technical_details.txt Accessed: 2023-05-13
[53]
Andreas Zeller, Rahul Gopinath, Marcel Böhme, Gordon Fraser, and Christian Holler. 2023. The Fuzzing Book. CISPA Helmholtz Center for Information Security. https://www.fuzzingbook.org/ Retrieved 2023-01-07 14:37:57+01:00

Cited By

View all
  • (2024)DynER: Optimized Test Case Generation for Representational State Transfer (REST)ful Application Programming Interface (API) Fuzzers Guided by Dynamic Error ResponsesElectronics10.3390/electronics1317347613:17(3476)Online publication date: 1-Sep-2024
  • (2024)When Fuzzing Meets LLMs: Challenges and OpportunitiesCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663784(492-496)Online publication date: 10-Jul-2024
  • (2024)Prompt Fuzzing for Fuzz Driver GenerationProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670396(3793-3807)Online publication date: 2-Dec-2024
  • Show More Cited By

Index Terms

  1. Large Language Models for Fuzzing Parsers (Registered Report)

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FUZZING 2023: Proceedings of the 2nd International Fuzzing Workshop
    July 2023
    61 pages
    ISBN:9798400702471
    DOI:10.1145/3605157
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. Fuzzing
    3. Large Language Models
    4. Parsers

    Qualifiers

    • Research-article

    Funding Sources

    • DARPA

    Conference

    FUZZING '23
    Sponsor:

    Upcoming Conference

    ISSTA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)333
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DynER: Optimized Test Case Generation for Representational State Transfer (REST)ful Application Programming Interface (API) Fuzzers Guided by Dynamic Error ResponsesElectronics10.3390/electronics1317347613:17(3476)Online publication date: 1-Sep-2024
    • (2024)When Fuzzing Meets LLMs: Challenges and OpportunitiesCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663784(492-496)Online publication date: 10-Jul-2024
    • (2024)Prompt Fuzzing for Fuzz Driver GenerationProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670396(3793-3807)Online publication date: 2-Dec-2024
    • (2024)Software Testing With Large Language Models: Survey, Landscape, and VisionIEEE Transactions on Software Engineering10.1109/TSE.2024.336820850:4(911-936)Online publication date: 20-Feb-2024
    • (2024)Adversarial generation method for smart contract fuzz testing seeds guided by chain-based LLMAutomated Software Engineering10.1007/s10515-024-00483-432:1Online publication date: 31-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media