Abstract
In this paper we propose and suggest a novel approach for grammar inference that is based on grammar-based fuzzing. While executing a target program with random inputs, our method identifies the program input language as a human-readable context-free grammar. Our strategy, which integrates machine learning techniques with program analysis of call trees, uses a far smaller set of seed inputs than earlier work. As a further contribution we also combine the processes of grammar inference and grammar-based fuzzing to incorporate random sample information into our inference technique. Our evaluation shows that our technique is effective in practice and that the input languages of tested recursive-descending parser are correctly inferred.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angluin, D., Kharitonov, M.: When won’t membership queries help? J. Comput. Syst. Sci. 50(2), 336–355 (1995)
Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing program input grammars. In: PLDI, pp. 95–110. ACM (2017)
Blazytko, T., et al.: GRIMOIRE: synthesizing structure while fuzzing. In: USENIX Security Symposium, pp. 1985–2002. USENIX Association (2019)
Bollig, B., Habermehl, P., Kern, C., Leucker, M.: Angluin-style learning of NFA. In: IJCAI, pp. 1004–1009 (2009)
Gold, E.: Language identification in the limit. Inf. Control 10(5), 447–474 (1967)
Gascon, H., Wressnegger, C., Yamaguchi, F., Arp, D., Rieck, K.: PULSAR: stateful black-box fuzzing of proprietary network protocols. In: Thuraisingham, B., Wang, X.F., Yegneswaran, V. (eds.) SecureComm 2015. LNICST, vol. 164, pp. 330–347. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-28865-9_18
Godefroid, P., Peleg, H., Singh, R.: Learn & fuzz: machine learning for input fuzzing. In: ASE, pp. 50–59. IEEE Computer Society (2017)
Gopinath, R., Mathis, B., Zeller, A.: Inferring input grammars from dynamic control flow. CoRR abs/1912.05937 (2019)
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory. Languages and Computation. Addison-Wesley, Boston (1979)
Höschele, M., Zeller, A.: Mining input grammars with AUTOGRAM. In: ICSE (Companion Volume), pp. 31–34. IEEE Computer Society (2017)
Kraft, N., Duffy, E., Malloy, B.: Grammar recovery from parse trees and metrics-guided grammar refactoring. IEEE Trans. Softw. Eng. 35(6), 780–794 (2009)
Mathis, B., Gopinath, R., Mera, M., Kampmann, A., Höschele, M., Zeller, A.: Parser-directed fuzzing. In: PLDI, pp. 548–560. ACM (2019)
Moser, M., Pichler, J.: eknows: platform for multi-language reverse engineering and documentation generation. In: 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 559–568 (2021)
Moser, M., Pichler, J., Pointner, A.: Towards attribute grammar mining by symbolic execution. In: SANER, pp. 811–815. IEEE (2022)
Wu, Z., et al.: REINAM: reinforcement learning for input-grammar inference. In: ESEC/SIGSOFT FSE, pp. 488–498. ACM (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sochor, H., Ferrarotti, F., Kaufmann, D. (2023). Fuzzing-Based Grammar Inference. In: Fournier-Viger, P., Hassan, A., Bellatreche, L. (eds) Model and Data Engineering. MEDI 2022. Lecture Notes in Computer Science, vol 13761. Springer, Cham. https://doi.org/10.1007/978-3-031-21595-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-21595-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21594-0
Online ISBN: 978-3-031-21595-7
eBook Packages: Computer ScienceComputer Science (R0)