Abstract
Decompiler is a system for recovering the original code from bytecode. A critical challenge in decompilers is that the decompiled code contains differences from the original code. These differences not only reduce the readability of the source code but may also change the program’s behavior. In this study, we propose a deep learning-based quirk fixation method that adopts grammatical error correction. One advantage of the proposed method is that it can be applied to any decompiler and programming language. Our experimental results show that the proposed method removes 55% of identifier quirks and 91% of structural quirks. In some cases, however, the proposed method injected a small amount of new quirks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cen, L., Gates, C.S., Si, L., Li, N.: A probabilistic discriminative model for android malware detection with decompiled source code. Trans. Dependable Secure Comput. (TDSC) 12(4), 400–412 (2014)
Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Experience 25(7), 811–829 (1995)
Cifuentes, C., Waddington, T., Van Emmerik, M.: Computer security analysis through decompilation and high-level debugging. In: Working Conference on Reverse Engineering (WCRE), pp. 375–380 (2001)
Falleri, J., Xavier Blanc, F.M., Martinez, M., Monperrus, M.: Fine-grained and accurate source code differencing. In: International Conference on Automated Software Engineering (ASE), pp. 313–324 (2014)
Harrand, N., Soto-Valero, C., Monperrus, M., Baudry, B.: The strengths and behavioral quirks of java bytecode decompilers. In: International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 92–102 (2019)
Hofmeister, J., Siegmund, J., Holt, D.: Shorter identifier names take longer to comprehend. In: International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 217–227 (2017)
Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019)
Jaffe, A., Lacomis, J., Schwartz, E.J., Goues, C.L., Vasilescu, B.: Meaningful variable names for decompiled code: a machine translation approach. In: International Conference on Program Comprehension (ICPC), pp. 20–30 (2018)
Lacomis, J., et al.: Dire: A neural approach to decompiled identifier naming. In: International Conference on Automated Software Engineering (ASE), pp. 628–639 (2019)
Liu, H., Shen, M., Zhu, J., Niu, N., Li, G., Zhang, L.: Deep learning based program generation from requirements text: are we there yet? Trans. Softw. Eng.(TSE) 48(4), 1268–1289 (2022)
Milosevic, N., Dehghantanha, A., Choo, K.K.R.: Machine learning aided android malware classification. Comput. Electr. Eng. 61, 266–274 (2017)
Nitin, V., Saieva, A., Ray, B., Kaiser, G.: Direct: a transformer-based model for decompiled identifier renaming. In: Workshop on Natural Language Processing for Programming (NLP4Prog), pp. 48–57 (2021)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Code T5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021)
Acknowledgements
This research was partially supported by JSPS KAKENHI Japan (Grant Number: JP21H04877, JP20H04166, JP21K18302, and JP21K11829)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaichi, R., Matsumoto, S., Kusumoto, S. (2024). Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model. In: Kadgien, R., Jedlitschka, A., Janes, A., Lenarduzzi, V., Li, X. (eds) Product-Focused Software Process Improvement. PROFES 2023. Lecture Notes in Computer Science, vol 14483. Springer, Cham. https://doi.org/10.1007/978-3-031-49266-2_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-49266-2_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49265-5
Online ISBN: 978-3-031-49266-2
eBook Packages: Computer ScienceComputer Science (R0)