skip to main content
10.1145/3650215.3650347acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcaConference Proceedingsconference-collections
research-article

A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model

Published: 16 April 2024 Publication History

Abstract

Binary function naming is a code analysis task that generates functional descriptions of functions, and its results can be applied in the fields of malicious code analysis, vulnerability causation analysis, and algorithm governance. Aiming at the shortcomings of the pseudocode abstract syntax tree being difficult to extract and the binary function naming scheme having low accuracy rate, a binary function naming prediction model A2N based on variable alignment and sequence translation model is proposed. First, A2N extracts the function variable features of binary files from debugging information and performs variable alignment with the pseudocode obtained from decompiling; then, it obtains the hierarchical structure of the binary functions and designs the node extraction rules to generate an abstract syntax tree AST for each function; then, extract the paths between the leaf nodes of the AST and serialize the tree structure to represent it; finally, with the help of the neural network translation model, establish a mapping between the AST and the binary function names to realize the prediction function. The experimental results show that compared with Dire, Nero and XFL models, the F1 value of A2N is improved by 84%, 44% and 14% on file-level isolation experiments respectively, and the F1 value reaches 80.94% on function-level isolation experiments.

References

[1]
WANG P H, PEI H B, ZHAO J Z, Challenges and measurements for governance of modern cyber space society[J]. Bulletin of Chinese Academy of Sciences, 2022, 37(12): 1686-1694.
[2]
QU S W, ZHANG B X. Practical Development and System Construction of Algorithm Governance. Renming Luntan·Xueshu Qianyan, 2023, (03): 108-111.
[3]
HU X. Prediction of code function names based on Natural Language Processing [D]. Tianjin University, 2023.
[4]
PAN X L, LIU C X, WANG M, Code Comment Generation Based on Concept Propagation for Software Projects[J/OL]. Journal of Software:1-18, 2023, 04, 14.
[5]
Allamanis M, Barr E T, Devanbu P, A survey of machine learning for big code and naturalness [J]. ACM Computing Surveys (CSUR), 2018, 51(4): 1-37.
[6]
Bavishi R, Pradel M, Sen K. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts [J]. arXiv preprint arXiv:1809.05193, 2018.
[7]
Zou D, Wang S, Xu S, μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection [J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18(5): 2224-2236.
[8]
Eagle C, Nance K. The Ghidra Book: The Definitive Guide [M]. no starch press, 2020.
[9]
Jaffe A, Lacomis J, Schwartz E J, Meaningful variable names for decompiled code: A machine translation approach[C]//Proceedings of the 26th Conference on Program Comprehension. 2018, 20-30.
[10]
Biondi P, Rigo R, Zennou S, BinCAT: purrfecting binary static analysis[C]//Symposium sur la sécurité des technologies de linformation et des communications. 2017.
[11]
He J, Ivanov P, Tsankov P, Debin: Predicting debug information in stripped binaries[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018: 1667-1680.
[12]
Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.
[13]
David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs[J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.
[14]
Patrick-Evans J, Dannehl M, Kinder J. XFL: extreme function labeling [J]. arXiv e-prints, 2021, arXiv: 2107.13404.
[15]
Alon U, Brody S, Levy O, code2seq: Generating sequences from structured representations of code [J]. arXiv preprint arXiv:1808.01400, 2018.
[16]
Klein G, Kim Y, Deng Y, Opennmt: Open-source toolkit for neural machine translation [J]. arXiv preprint arXiv:1701.02810, 2017.
[17]
David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs [J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.
[18]
Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.
[19]
MacKenzie D. gnu Coreutils [J]. Free Software Foundation, Inc, 1994, 2022.

Index Terms

  1. A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application
    October 2023
    1065 pages
    ISBN:9798400709449
    DOI:10.1145/3650215
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Henan Province Science and Technology Research Projects

    Conference

    ICMLCA 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 28
      Total Downloads
    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media