ABSTRACT
Binary function naming is a code analysis task that generates functional descriptions of functions, and its results can be applied in the fields of malicious code analysis, vulnerability causation analysis, and algorithm governance. Aiming at the shortcomings of the pseudocode abstract syntax tree being difficult to extract and the binary function naming scheme having low accuracy rate, a binary function naming prediction model A2N based on variable alignment and sequence translation model is proposed. First, A2N extracts the function variable features of binary files from debugging information and performs variable alignment with the pseudocode obtained from decompiling; then, it obtains the hierarchical structure of the binary functions and designs the node extraction rules to generate an abstract syntax tree AST for each function; then, extract the paths between the leaf nodes of the AST and serialize the tree structure to represent it; finally, with the help of the neural network translation model, establish a mapping between the AST and the binary function names to realize the prediction function. The experimental results show that compared with Dire, Nero and XFL models, the F1 value of A2N is improved by 84%, 44% and 14% on file-level isolation experiments respectively, and the F1 value reaches 80.94% on function-level isolation experiments.
- WANG P H, PEI H B, ZHAO J Z, Challenges and measurements for governance of modern cyber space society[J]. Bulletin of Chinese Academy of Sciences, 2022, 37(12): 1686-1694.Google Scholar
- QU S W, ZHANG B X. Practical Development and System Construction of Algorithm Governance. Renming Luntan·Xueshu Qianyan, 2023, (03): 108-111.Google Scholar
- HU X. Prediction of code function names based on Natural Language Processing [D]. Tianjin University, 2023.Google Scholar
- PAN X L, LIU C X, WANG M, Code Comment Generation Based on Concept Propagation for Software Projects[J/OL]. Journal of Software:1-18, 2023, 04, 14.Google Scholar
- Allamanis M, Barr E T, Devanbu P, A survey of machine learning for big code and naturalness [J]. ACM Computing Surveys (CSUR), 2018, 51(4): 1-37.Google Scholar
- Bavishi R, Pradel M, Sen K. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts [J]. arXiv preprint arXiv:1809.05193, 2018.Google Scholar
- Zou D, Wang S, Xu S, μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection [J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18(5): 2224-2236.Google Scholar
- Eagle C, Nance K. The Ghidra Book: The Definitive Guide [M]. no starch press, 2020.Google Scholar
- Jaffe A, Lacomis J, Schwartz E J, Meaningful variable names for decompiled code: A machine translation approach[C]//Proceedings of the 26th Conference on Program Comprehension. 2018, 20-30.Google ScholarDigital Library
- Biondi P, Rigo R, Zennou S, BinCAT: purrfecting binary static analysis[C]//Symposium sur la sécurité des technologies de linformation et des communications. 2017.Google Scholar
- He J, Ivanov P, Tsankov P, Debin: Predicting debug information in stripped binaries[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018: 1667-1680.Google Scholar
- Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.Google Scholar
- David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs[J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.Google ScholarDigital Library
- Patrick-Evans J, Dannehl M, Kinder J. XFL: extreme function labeling [J]. arXiv e-prints, 2021, arXiv: 2107.13404.Google Scholar
- Alon U, Brody S, Levy O, code2seq: Generating sequences from structured representations of code [J]. arXiv preprint arXiv:1808.01400, 2018.Google Scholar
- Klein G, Kim Y, Deng Y, Opennmt: Open-source toolkit for neural machine translation [J]. arXiv preprint arXiv:1701.02810, 2017.Google Scholar
- David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs [J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.Google ScholarDigital Library
- Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.Google Scholar
- MacKenzie D. gnu Coreutils [J]. Free Software Foundation, Inc, 1994, 2022.Google Scholar
Index Terms
- A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model
Recommendations
A Structure-Based Model for Chinese Organization Name Translation
Named entity (NE) translation is a fundamental task in multilingual natural language processing. The performance of a machine translation system depends heavily on precise translation of the inclusive NEs. Furthermore, organization name (ON) is the most ...
Named entity translation method based on machine translation lexicon
AbstractIn the context of the rapid development of computer technology, communication between various languages has become increasingly important. Among the research methods of named entities, the research on named entity translation methods based on ...
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementAmbiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Comments