skip to main content
10.1145/3468264.3468607acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections

StateFormer: fine-grained type recovery from binaries using generative state modeling

Published:18 August 2021Publication History

ABSTRACT

Binary type inference is a critical reverse engineering task supporting many security applications, including vulnerability analysis, binary hardening, forensics, and decompilation. It is a difficult task because source-level type information is often stripped during compilation, leaving only binaries with untyped memory and register accesses. Existing approaches rely on hand-coded type inference rules defined by domain experts, which are brittle and require nontrivial effort to maintain and update. Even though machine learning approaches have shown promise at automatically learning the inference rules, their accuracy is still low, especially for optimized binaries.

We present StateFormer, a new neural architecture that is adept at accurate and robust type inference. StateFormer follows a two-step transfer learning paradigm. In the pretraining step, the model is trained with Generative State Modeling (GSM), a novel task that we design to teach the model to statically approximate execution effects of assembly instructions in both forward and backward directions. In the finetuning step, the pretrained model learns to use its knowledge of operational semantics to infer types.

We evaluate StateFormer's performance on a corpus of 33 popular open-source software projects containing over 1.67 billion variables of different types. The programs are compiled with GCC and LLVM over 4 optimization levels O0-O3, and 3 obfuscation passes based on LLVM. Our model significantly outperforms state-of-the-art ML-based tools by 14.6% in recovering types for both function arguments and variables. Our ablation studies show that GSM improves type inference accuracy by 33%.

References

  1. National Security Agency. 2019. Ghidra Disassembler. https://ghidra-sre.org/.Google ScholarGoogle Scholar
  2. Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In 2015 10th Joint Meeting on Foundations of Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. Comput. Surveys ( 2018 ).Google ScholarGoogle Scholar
  4. Anil Altinay, Joseph Nash, Taddeus Kroes, Prabhu Rajasekaran, Dixin Zhou, Adrian Dabrowski, David Gens, Yeoul Na, Stijn Volckaert, Cristiano Giufrida, et al. 2020. BinRec: dynamic binary lifting and recompilation. In Fifteenth European Conference on Computer Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jong-hoon An, Avik Chaudhuri, Jefrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices ( 2011 ).Google ScholarGoogle Scholar
  6. Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In European conference on Object-oriented programming.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dennis Andriesse, Asia Slowinska, and Herbert Bos. 2017. Compiler-agnostic function detection in binaries. In 2017 IEEE European Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  8. Gogul Balakrishnan and Thomas Reps. 2004. Analyzing memory accesses in x86 executables. In International conference on compiler construction.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gogul Balakrishnan and Thomas Reps. 2007. Divine: Discovering variables in executables. In International Workshop on Verification, Model Checking, and Abstract Interpretation.Google ScholarGoogle ScholarCross RefCross Ref
  10. Tifany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Security Symposium.Google ScholarGoogle Scholar
  11. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eli Bendersky. [n.d.]. PYEFLTOOLS. https://github.com/eliben/pyelftools.Google ScholarGoogle Scholar
  13. Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Programming with" big code": Lessons, techniques and applications. In 1st Summit on Advances in Programming Languages.Google ScholarGoogle Scholar
  14. David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. 2011. BAP: A binary analysis platform. In International Conference on Computer Aided Verification.Google ScholarGoogle ScholarCross RefCross Ref
  15. Juan Caballero, Noah M Johnson, Stephen McCamant, and Dawn Song. 2010. Binary Code Extraction and Interface Identification for Security Applications. In 2010 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  16. Juan Caballero and Zhiqiang Lin. 2016. Type inference on executables. Comput. Surveys ( 2016 ).Google ScholarGoogle Scholar
  17. Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. 2009. Dispatcher: Enabling active botnet infiltration using automatic protocol reverseengineering. In 16th ACM conference on Computer and communications security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xi Chen, Asia Slowinska, Dennis Andriesse, Herbert Bos, and Cristiano Giufrida. 2015. StackArmor: Comprehensive protection from stack-based memory error vulnerabilities for binaries. In 2015 Network and Distributed System Security Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  19. Mihai Christodorescu, Nicholas Kidd, and Wen-Han Goh. 2005. String analysis for x86 binaries. In 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural nets can learn function type signatures from binaries. In 26th USENIX Security Symposium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ezgi Çiçek, Weihao Qu, Gilles Barthe, Marco Gaboardi, and Deepak Garg. 2019. Bidirectional type checking for relational properties. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T King. 2008. Digging for Data Structures. In 2008 USENIX Symposium on Operating Systems Design and Implementation.Google ScholarGoogle Scholar
  23. Loris d'Antoni, Marco Gaboardi, Emilio Jesús Gallego Arias, Andreas Haeberlen, and Benjamin Pierce. 2013. Sensitivity analysis using type-based constraints. In 1st annual workshop on Functional programming concepts in domain-specific languages.Google ScholarGoogle Scholar
  24. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google ScholarGoogle Scholar
  25. David Dewey and Jonathon T Gifin. 2012. Static detection of C++ vtable escape vulnerabilities in binary code. In 2012 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  26. EN Dolgova and AV Chernov. 2009. Automatic reconstruction of data types in the decompilation problem. Programming and Computer Software ( 2009 ).Google ScholarGoogle Scholar
  27. Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. 2013. Scalable variable and data type detection in a binary rewriter. In 34th ACM SIGPLAN conference on Programming language design and implementation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. MV Emmerik and Trent Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  29. Michael D Ernst. 2003. Static and dynamic analysis: Synergy and duality. In 2003 International Conference on Software Engineering Workshop on Dynamic Analysis. 24-27.Google ScholarGoogle Scholar
  30. Reza Mirzazade Farkhani, Saman Jafari, Sajjad Arshad, William Robertson, Engin Kirda, and Hamed Okhravi. 2018. On the efectiveness of type-based control lfow integrity. In 34th Annual Computer Security Applications Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alexander Fokin, Katerina Troshina, and Alexander Chernov. 2010. Reconstruction of class hierarchies for decompilation of C++ programs. In 2010 14th European Conference on Software Maintenance and Reengineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Michael Furr, Jong-hoon An, Jefrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In 2009 ACM symposium on Applied Computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Patrice Godefroid. 2014. Micro execution. In 36th International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Neville Grech, Bernd Fischer, and Julian Rathke. 2018. Preemptive type checking. Journal of logical and algebraic methods in programming ( 2018 ).Google ScholarGoogle Scholar
  35. Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive type checking in dynamically typed languages. In International Colloquium on Theoretical Aspects of Computing.Google ScholarGoogle Scholar
  36. Philip J Guo, Jef H Perkins, Stephen McCamant, and Michael D Ernst. 2006. Dynamic inference of abstract types. In 2006 international symposium on Software testing and analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Cristiano Giufrida, Herbert Bos, and Erik Van Der Kouwe. 2016. TypeSan: Practical type confusion detection. In 2016 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Istvan Haller, Asia Slowinska, and Herbert Bos. 2013. Mempick: High-level data structure detection in C/C++ binaries. In 2013 20th Working Conference on Reverse Engineering.Google ScholarGoogle Scholar
  39. Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMTbased type inference for Python 3. In International Conference on Computer Aided Verification.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. DEBIN: Predicting debug information in stripped binaries. In 2018 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering.Google ScholarGoogle Scholar
  42. Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In 21st USENIX Security Symposium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Md Nahid Hossain, Junao Wang, Ofir Weisse, R Sekar, Daniel Genkin, Boyuan He, Scott D Stoller, Gan Fang, Frank Piessens, Evan Downing, et al. 2018. Dependence-preserving data compaction for scalable forensic analysis. In 27th USENIX Security Symposium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In International Static Analysis Symposium.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In 27th ACM SIGSOFT international symposium on software testing and analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Wuxia Jin, Yuanfang Cai, Rick Kazman, Gang Zhang, Qinghua Zheng, and Ting Liu. 2020. Exploring the Architectural Impact of Possible Dependencies in Python Software. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering.Google ScholarGoogle Scholar
  47. Wesley Jin, Cory Cohen, Jefrey Gennari, Charles Hines, Sagar Chaki, Arie Gurfinkel, Jefrey Havrilla, and Priya Narasimhan. 2014. Recovering C+ + objects from binaries using inter-procedural data-flow analysis. In Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Changhee Jung and Nathan Clark. 2009. DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. In 33rd Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  50. Yann LeCun, Patrice Y Simard, and Barak Pearlmutter. 1993. Automatic learning rate maximization by on-line estimation of the hessian's eigenvectors. In 1993 Advances in Neural Information Processing System.Google ScholarGoogle Scholar
  51. JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled reverse engineering of types in binary programs. In 2011 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  52. Junghee Lim, Thomas Reps, and Ben Liblit. 2006. Extracting output formats from executables. In 2006 13th Working Conference on Reverse Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yan Lin and Debin Gao. 2021. When Function Signature Recovery Meets Compiler Optimization. In 2021 IEEE Symposium on Security and Privacy.Google ScholarGoogle Scholar
  54. Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. 2008. Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution. In 2008 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  55. Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Automatic reverse engineering of data structures from binary execution. In 2010 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  56. Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Taeyoung Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to spot and refactor inconsistent method names. In 2019 IEEE/ACM 41st International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Kangjie Lu and Hong Hu. 2019. Where does it go? refining indirect-call targets with multi-layer type analysis. In 2019 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Andreas Madsen and Alexander Rosenberg Johansen. 2020. Neural Arithmetic Units. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  59. Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. 2019. TypeMiner: Recovering types in binary programs using machine learning. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.Google ScholarGoogle ScholarCross RefCross Ref
  60. Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. James Martens and Ilya Sutskever. 2011. Learning recurrent neural networks with hessian-free optimization. In 28th international conference on machine learning.Google ScholarGoogle Scholar
  62. Kenneth Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, Xiangyu Zhang, and Zhiqiang Lin. 2019. Probabilistic disassembly. In 41st International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  64. Hanne Riis Nielson and Flemming Nielson. 1992. Semantics with applications. Vol. 104. Springer.Google ScholarGoogle Scholar
  65. Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. Fairseq: A fast, extensible toolkit for sequence modeling. In 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations.Google ScholarGoogle ScholarCross RefCross Ref
  66. Chengbin Pang, Ruotong Yu, Yaohui Chen, Eric Koskinen, Georgios Portokalidis, Bing Mao, and Jun Xu. 2021. SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask. In 2021 IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  67. Kexin Pei, Jonas Guan, David Williams-King, Junfeng Yang, and Suman Jana. 2021. XDA: Accurate, Robust Disassembly with Transfer Learning. In 2021 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  68. Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv preprint arXiv: 2012. 08680 ( 2020 ).Google ScholarGoogle Scholar
  69. Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In International Conference on Knowledge Science, Engineering and Management.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. G. D. Plotkin. 1981. A Structural Approach to Operational Semantics. University of Aarhus ( 1981 ).Google ScholarGoogle Scholar
  71. Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Michael Pradel, Parker Schuh, and Koushik Sen. 2015. TypeDevil: Dynamic type inconsistency analysis for JavaScript. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.Google ScholarGoogle ScholarCross RefCross Ref
  73. Michael Pradel and Koushik Sen. 2015. The good, the bad, and the ugly: An empirical study of implicit type conversions in JavaScript. In 29th European Conference on Object-Oriented Programming.Google ScholarGoogle Scholar
  74. Aravind Prakash, Heng Yin, and Zhenkai Liang. 2013. Enforcing system-wide control flow integrity for exploit detection and diagnosis. In 8th ACM SIGSAC symposium on Information, computer and communications security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Nguyen Anh Quynh. 2014. Capstone: Next-gen disassembly framework. Black Hat USA ( 2014 ).Google ScholarGoogle Scholar
  76. NGUYEN Anh Quynh and DANG Hoang Vu. 2015. Unicorn: Next Generation CPU Emulator Framework. BlackHat USA ( 2015 ).Google ScholarGoogle Scholar
  77. Easwaran Raman and David I August. 2005. Recursive data structure profiling. In 2005 workshop on Memory system performance.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code”. ACM SIGPLAN Notices ( 2015 ).Google ScholarGoogle Scholar
  79. Brianna M Ren, John Toman, T Stephen Strickland, and Jefrey S Foster. 2013. The ruby type checker. In 28th Annual ACM Symposium on Applied Computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Martin P Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2012. Automated API property inference techniques. IEEE Transactions on Software Engineering ( 2012 ).Google ScholarGoogle Scholar
  81. Hex-Rays SA. 2008. IDA Pro Disassembler.Google ScholarGoogle Scholar
  82. Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Ofensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  83. Asia Slowinska, Traian Stancescu, and Herbert Bos. 2011. Howard: A Dynamic Excavator for Reverse Engineering Data Structures. In 2011 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  84. Venkatesh Srinivasan and Thomas Reps. 2014. Recovery of class hierarchies and composition relationships from machine code. In International Conference on Compiler Construction.Google ScholarGoogle ScholarCross RefCross Ref
  85. Binary Ninja Team. 2015. Binary Ninja-A new type of reversing platform. https://binary.ninja/.Google ScholarGoogle Scholar
  86. Radare2 Team. 2017. Radare2 GitHub repository. https://github.com/radare/ radare2.Google ScholarGoogle Scholar
  87. David Trabish, Timotej Kapus, Noam Rinetzky, and Cristian Cadar. 2020. Pastsensitive pointer analysis for symbolic execution. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarGoogle Scholar
  88. Andrew Trask, Felix Hill, Scott E Reed, Jack Rae, Chris Dyer, and Phil Blunsom. 2018. Neural arithmetic logic units. In Advances in Neural Information Processing Systems. 8035-8044.Google ScholarGoogle Scholar
  89. David Urbina, Yufei Gu, Juan Caballero, and Zhiqiang Lin. 2014. Sigpath: A memory graph based approach for program data introspection and modification. In European Symposium on Research in Computer Security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Muhammad Usman, Wenxi Wang, Kaiyuan Wang, Cagdas Yelen, Nima Dini, and Sarfraz Khurshid. 2020. A study of learning likely data structure properties using machine learning models. International Journal on Software Tools for Technology Transfer ( 2020 ).Google ScholarGoogle Scholar
  91. Victor Van Der Veen, Enes Göktas, Moritz Contag, Andre Pawoloski, Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athanasopoulos, and Cristiano Giufrida. 2016. A tough call: Mitigating advanced code-reuse attacks at the binary level. In 2016 IEEE Symposium on Security and Privacy.Google ScholarGoogle ScholarCross RefCross Ref
  92. Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In 2017 11th joint meeting on foundations of software engineering.Google ScholarGoogle Scholar
  93. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 2017 Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  94. Yaza Wainakh, Moiz Rauf, and Michael Pradel. 2021. IdBench: Evaluating Semantic Representations of Identifier Names in Source Code. In 2021 IEEE/ACM 43rd International Conference on Software Engineering.Google ScholarGoogle Scholar
  95. Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, and Jiasu Sun. 2009. Locating need-to-translate constant strings for software internationalization. In 2009 IEEE 31st International Conference on Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Richard Wartell, Vishwath Mohan, Kevin W Hamlen, and Zhiqiang Lin. 2012. Securing untrusted code via compiler-agnostic binary rewriting. In 28th Annual Computer Security Applications Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. In 2020 International Conference on Learning Representations.Google ScholarGoogle Scholar
  98. Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. IEEE 78, 10 ( 1990 ), 1550-1560.Google ScholarGoogle Scholar
  99. David Williams-King, Hidenori Kobayashi, Kent Williams-King, Graham Patterson, Frank Spano, Yu Jian Wu, Junfeng Yang, and Vasileios P. Kemerlis. 2020. Egalito: Layout-Agnostic Binary Recompilation. In 25th International Conference on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle Scholar
  100. Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In 24th ACM SIGSOFT international symposium on foundations of software engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Dongrui Zeng and Gang Tan. 2018. From Debugging-Information Based BinaryLevel Type Inference to CFG Generation. In Eighth ACM Conference on Data and Application Security and Privacy.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Junyuan Zeng, Yangchun Fu, Kenneth A Miller, Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2013. Obfuscation resilient binary code reuse through traceoriented programming. In 2013 ACM SIGSAC conference on Computer & communications security.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Chao Zhang, Chengyu Song, Kevin Zhijie Chen, Zhaofeng Chen, and Dawn Song. 2015. VTint: Protecting Virtual Function Tables' Integrity. In 2015 Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  104. Naville Zhang. 2017. Hikari-an improvement over Obfuscator-LLVM. https: //github.com/HikariObfuscator/Hikari.Google ScholarGoogle Scholar

Index Terms

  1. StateFormer: fine-grained type recovery from binaries using generative state modeling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
        August 2021
        1690 pages
        ISBN:9781450385626
        DOI:10.1145/3468264

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 August 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate112of543submissions,21%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader