research-article

A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model

Authors:

Ruinan YangAuthors Info & Claims

ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application

Pages 757 - 761

https://doi.org/10.1145/3650215.3650347

Published: 16 April 2024 Publication History

Abstract

Binary function naming is a code analysis task that generates functional descriptions of functions, and its results can be applied in the fields of malicious code analysis, vulnerability causation analysis, and algorithm governance. Aiming at the shortcomings of the pseudocode abstract syntax tree being difficult to extract and the binary function naming scheme having low accuracy rate, a binary function naming prediction model A2N based on variable alignment and sequence translation model is proposed. First, A2N extracts the function variable features of binary files from debugging information and performs variable alignment with the pseudocode obtained from decompiling; then, it obtains the hierarchical structure of the binary functions and designs the node extraction rules to generate an abstract syntax tree AST for each function; then, extract the paths between the leaf nodes of the AST and serialize the tree structure to represent it; finally, with the help of the neural network translation model, establish a mapping between the AST and the binary function names to realize the prediction function. The experimental results show that compared with Dire, Nero and XFL models, the F1 value of A2N is improved by 84%, 44% and 14% on file-level isolation experiments respectively, and the F1 value reaches 80.94% on function-level isolation experiments.

References

[1]

WANG P H, PEI H B, ZHAO J Z, Challenges and measurements for governance of modern cyber space society[J]. Bulletin of Chinese Academy of Sciences, 2022, 37(12): 1686-1694.

[2]

QU S W, ZHANG B X. Practical Development and System Construction of Algorithm Governance. Renming Luntan·Xueshu Qianyan, 2023, (03): 108-111.

[3]

HU X. Prediction of code function names based on Natural Language Processing [D]. Tianjin University, 2023.

[4]

PAN X L, LIU C X, WANG M, Code Comment Generation Based on Concept Propagation for Software Projects[J/OL]. Journal of Software:1-18, 2023, 04, 14.

[5]

Allamanis M, Barr E T, Devanbu P, A survey of machine learning for big code and naturalness [J]. ACM Computing Surveys (CSUR), 2018, 51(4): 1-37.

[6]

Bavishi R, Pradel M, Sen K. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts [J]. arXiv preprint arXiv:1809.05193, 2018.

[7]

Zou D, Wang S, Xu S, μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection [J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18(5): 2224-2236.

[8]

Eagle C, Nance K. The Ghidra Book: The Definitive Guide [M]. no starch press, 2020.

[9]

Jaffe A, Lacomis J, Schwartz E J, Meaningful variable names for decompiled code: A machine translation approach[C]//Proceedings of the 26th Conference on Program Comprehension. 2018, 20-30.

Digital Library

[10]

Biondi P, Rigo R, Zennou S, BinCAT: purrfecting binary static analysis[C]//Symposium sur la sécurité des technologies de linformation et des communications. 2017.

[11]

He J, Ivanov P, Tsankov P, Debin: Predicting debug information in stripped binaries[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018: 1667-1680.

[12]

Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.

[13]

David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs[J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.

Digital Library

[14]

Patrick-Evans J, Dannehl M, Kinder J. XFL: extreme function labeling [J]. arXiv e-prints, 2021, arXiv: 2107.13404.

[15]

Alon U, Brody S, Levy O, code2seq: Generating sequences from structured representations of code [J]. arXiv preprint arXiv:1808.01400, 2018.

[16]

Klein G, Kim Y, Deng Y, Opennmt: Open-source toolkit for neural machine translation [J]. arXiv preprint arXiv:1701.02810, 2017.

[17]

David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs [J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.

Digital Library

[18]

Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.

[19]

MacKenzie D. gnu Coreutils [J]. Free Software Foundation, Inc, 1994, 2022.

Index Terms

A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model
1. Security and privacy
  1. Software and application security
    1. Software reverse engineering

Recommendations

A Transformer-based Function Symbol Name Inference Model from an Assembly Language for Binary Reversing
ASIA CCS '23: Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security

Reverse engineering of a stripped binary has a wide range of applications, yet it is challenging mainly due to the lack of contextually useful information within. Once debugging symbols (e.g., variable names, types, function names) are discarded, ...
A Structure-Based Model for Chinese Organization Name Translation

Named entity (NE) translation is a fundamental task in multilingual natural language processing. The performance of a machine translation system depends heavily on precise translation of the inclusive NEs. Furthermore, organization name (ON) is the most ...
Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-task Learning

Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application

October 2023

1065 pages

ISBN:9798400709449

DOI:10.1145/3650215

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Henan Province Science and Technology Research Projects

Conference

ICMLCA 2023

ICMLCA 2023: 2023 4th International Conference on Machine Learning and Computer Application

October 27 - 29, 2023

Hangzhou, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
28
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten