Fuzzing-Based Grammar Inference

Sochor, Hannes; Ferrarotti, Flavio; Kaufmann, Daniela

doi:10.1007/978-3-031-21595-7_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13761))

Included in the following conference series:

International Conference on Model and Data Engineering

553 Accesses
1 Citations

Abstract

In this paper we propose and suggest a novel approach for grammar inference that is based on grammar-based fuzzing. While executing a target program with random inputs, our method identifies the program input language as a human-readable context-free grammar. Our strategy, which integrates machine learning techniques with program analysis of call trees, uses a far smaller set of seed inputs than earlier work. As a further contribution we also combine the processes of grammar inference and grammar-based fuzzing to incorporate random sample information into our inference technique. Our evaluation shows that our technique is effective in practice and that the input languages of tested recursive-descending parser are correctly inferred.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://ssw.jku.at/Research/Projects/Coco/.

References

Angluin, D., Kharitonov, M.: When won’t membership queries help? J. Comput. Syst. Sci. 50(2), 336–355 (1995)
Article MathSciNet MATH Google Scholar
Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing program input grammars. In: PLDI, pp. 95–110. ACM (2017)
Google Scholar
Blazytko, T., et al.: GRIMOIRE: synthesizing structure while fuzzing. In: USENIX Security Symposium, pp. 1985–2002. USENIX Association (2019)
Google Scholar
Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of NFA. In: IJCAI, pp. 1004–1009 (2009)
Google Scholar
Gold, E.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Article MathSciNet MATH Google Scholar
Gascon, H., Wressnegger, C., Yamaguchi, F., Arp, D., Rieck, K.: PULSAR: stateful black-box fuzzing of proprietary network protocols. In: Thuraisingham, B., Wang, X.F., Yegneswaran, V. (eds.) SecureComm 2015. LNICST, vol. 164, pp. 330–347. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28865-9_18
Chapter Google Scholar
Godefroid, P., Peleg, H., Singh, R.: Learn & fuzz: machine learning for input fuzzing. In: ASE, pp. 50–59. IEEE Computer Society (2017)
Google Scholar
Gopinath, R., Mathis, B., Zeller, A.: Inferring input grammars from dynamic control flow. CoRR abs/1912.05937 (2019)
Google Scholar
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory. Languages and Computation. Addison-Wesley, Boston (1979)
Google Scholar
Höschele, M., Zeller, A.: Mining input grammars with AUTOGRAM. In: ICSE (Companion Volume), pp. 31–34. IEEE Computer Society (2017)
Google Scholar
Kraft, N., Duffy, E., Malloy, B.: Grammar recovery from parse trees and metrics-guided grammar refactoring. IEEE Trans. Softw. Eng. 35(6), 780–794 (2009)
Article Google Scholar
Mathis, B., Gopinath, R., Mera, M., Kampmann, A., Höschele, M., Zeller, A.: Parser-directed fuzzing. In: PLDI, pp. 548–560. ACM (2019)
Google Scholar
Moser, M., Pichler, J.: eknows: platform for multi-language reverse engineering and documentation generation. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 559–568 (2021)
Google Scholar
Moser, M., Pichler, J., Pointner, A.: Towards attribute grammar mining by symbolic execution. In: SANER, pp. 811–815. IEEE (2022)
Google Scholar
Wu, Z., et al.: REINAM: reinforcement learning for input-grammar inference. In: ESEC/SIGSOFT FSE, pp. 488–498. ACM (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Software Competence Center Hagenberg GmbH (SCCH), Hagenberg, Austria
Hannes Sochor, Flavio Ferrarotti & Daniela Kaufmann

Authors

Hannes Sochor
View author publications
You can also search for this author in PubMed Google Scholar
Flavio Ferrarotti
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hannes Sochor .

Editor information

Editors and Affiliations

Shenzhen University, Shenzhen, Guangdong, China
Philippe Fournier-Viger
Nile University, Giza, Egypt
Ahmed Hassan
ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sochor, H., Ferrarotti, F., Kaufmann, D. (2023). Fuzzing-Based Grammar Inference. In: Fournier-Viger, P., Hassan, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2022. Lecture Notes in Computer Science, vol 13761. Springer, Cham. https://doi.org/10.1007/978-3-031-21595-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-21595-7_6
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21594-0
Online ISBN: 978-3-031-21595-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics