skip to main content
10.1145/3650215.3650347acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlcaConference Proceedingsconference-collections
research-article

A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model

Published:16 April 2024Publication History

ABSTRACT

Binary function naming is a code analysis task that generates functional descriptions of functions, and its results can be applied in the fields of malicious code analysis, vulnerability causation analysis, and algorithm governance. Aiming at the shortcomings of the pseudocode abstract syntax tree being difficult to extract and the binary function naming scheme having low accuracy rate, a binary function naming prediction model A2N based on variable alignment and sequence translation model is proposed. First, A2N extracts the function variable features of binary files from debugging information and performs variable alignment with the pseudocode obtained from decompiling; then, it obtains the hierarchical structure of the binary functions and designs the node extraction rules to generate an abstract syntax tree AST for each function; then, extract the paths between the leaf nodes of the AST and serialize the tree structure to represent it; finally, with the help of the neural network translation model, establish a mapping between the AST and the binary function names to realize the prediction function. The experimental results show that compared with Dire, Nero and XFL models, the F1 value of A2N is improved by 84%, 44% and 14% on file-level isolation experiments respectively, and the F1 value reaches 80.94% on function-level isolation experiments.

References

  1. WANG P H, PEI H B, ZHAO J Z, Challenges and measurements for governance of modern cyber space society[J]. Bulletin of Chinese Academy of Sciences, 2022, 37(12): 1686-1694.Google ScholarGoogle Scholar
  2. QU S W, ZHANG B X. Practical Development and System Construction of Algorithm Governance. Renming Luntan·Xueshu Qianyan, 2023, (03): 108-111.Google ScholarGoogle Scholar
  3. HU X. Prediction of code function names based on Natural Language Processing [D]. Tianjin University, 2023.Google ScholarGoogle Scholar
  4. PAN X L, LIU C X, WANG M, Code Comment Generation Based on Concept Propagation for Software Projects[J/OL]. Journal of Software:1-18, 2023, 04, 14.Google ScholarGoogle Scholar
  5. Allamanis M, Barr E T, Devanbu P, A survey of machine learning for big code and naturalness [J]. ACM Computing Surveys (CSUR), 2018, 51(4): 1-37.Google ScholarGoogle Scholar
  6. Bavishi R, Pradel M, Sen K. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts [J]. arXiv preprint arXiv:1809.05193, 2018.Google ScholarGoogle Scholar
  7. Zou D, Wang S, Xu S, μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection [J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18(5): 2224-2236.Google ScholarGoogle Scholar
  8. Eagle C, Nance K. The Ghidra Book: The Definitive Guide [M]. no starch press, 2020.Google ScholarGoogle Scholar
  9. Jaffe A, Lacomis J, Schwartz E J, Meaningful variable names for decompiled code: A machine translation approach[C]//Proceedings of the 26th Conference on Program Comprehension. 2018, 20-30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Biondi P, Rigo R, Zennou S, BinCAT: purrfecting binary static analysis[C]//Symposium sur la sécurité des technologies de linformation et des communications. 2017.Google ScholarGoogle Scholar
  11. He J, Ivanov P, Tsankov P, Debin: Predicting debug information in stripped binaries[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. 2018: 1667-1680.Google ScholarGoogle Scholar
  12. Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.Google ScholarGoogle Scholar
  13. David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs[J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Patrick-Evans J, Dannehl M, Kinder J. XFL: extreme function labeling [J]. arXiv e-prints, 2021, arXiv: 2107.13404.Google ScholarGoogle Scholar
  15. Alon U, Brody S, Levy O, code2seq: Generating sequences from structured representations of code [J]. arXiv preprint arXiv:1808.01400, 2018.Google ScholarGoogle Scholar
  16. Klein G, Kim Y, Deng Y, Opennmt: Open-source toolkit for neural machine translation [J]. arXiv preprint arXiv:1701.02810, 2017.Google ScholarGoogle Scholar
  17. David Y, Alon U, Yahav E. Neural reverse engineering of stripped binaries using augmented control flow graphs [J]. Proceedings of the ACM on Programming Languages, 2020, 4(OOPSLA): 1-28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lacomis J, Yin P, Schwartz E, Dire: A neural approach to decompiled identifier naming[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2019, 628-639.Google ScholarGoogle Scholar
  19. MacKenzie D. gnu Coreutils [J]. Free Software Foundation, Inc, 1994, 2022.Google ScholarGoogle Scholar

Index Terms

  1. A Binary Function Name Prediction Method Based on Variable Alignment and Translation Model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application
      October 2023
      1065 pages
      ISBN:9798400709449
      DOI:10.1145/3650215

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 April 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format