research-article

Public Access

StateFormer: fine-grained type recovery from binaries using generative state modeling

Authors:
Kexin Pei

Columbia University, USA

Columbia University, USA
View Profile

,
Jonas Guan

University of Toronto, Canada

University of Toronto, Canada
View Profile

,
Matthew Broughton

Columbia University, USA

Columbia University, USA
View Profile

,
Zhongtian Chen

Columbia University, USA

Columbia University, USA
View Profile

,
Songchen Yao

Columbia University, USA

Columbia University, USA
View Profile

,
David Williams-King

Columbia University, USA

Columbia University, USA
View Profile

,
Vikas Ummadisetty

Dublin High School, Ireland

Dublin High School, Ireland
View Profile

,
Junfeng Yang

Columbia University, USA

Columbia University, USA
View Profile

,
Baishakhi Ray

Columbia University, USA

Columbia University, USA
View Profile

,
Suman Jana

Columbia University, USA

Columbia University, USA
View Profile

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringAugust 2021Pages 690–702https://doi.org/10.1145/3468264.3468607

Published:18 August 2021Publication History

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 690–702

ABSTRACT

Binary type inference is a critical reverse engineering task supporting many security applications, including vulnerability analysis, binary hardening, forensics, and decompilation. It is a difficult task because source-level type information is often stripped during compilation, leaving only binaries with untyped memory and register accesses. Existing approaches rely on hand-coded type inference rules defined by domain experts, which are brittle and require nontrivial effort to maintain and update. Even though machine learning approaches have shown promise at automatically learning the inference rules, their accuracy is still low, especially for optimized binaries.

We present StateFormer, a new neural architecture that is adept at accurate and robust type inference. StateFormer follows a two-step transfer learning paradigm. In the pretraining step, the model is trained with Generative State Modeling (GSM), a novel task that we design to teach the model to statically approximate execution effects of assembly instructions in both forward and backward directions. In the finetuning step, the pretrained model learns to use its knowledge of operational semantics to infer types.

We evaluate StateFormer's performance on a corpus of 33 popular open-source software projects containing over 1.67 billion variables of different types. The programs are compiled with GCC and LLVM over 4 optimization levels O0-O3, and 3 obfuscation passes based on LLVM. Our model significantly outperforms state-of-the-art ML-based tools by 14.6% in recovering types for both function arguments and variables. Our ablation studies show that GSM improves type inference accuracy by 33%.

References

National Security Agency. 2019. Ghidra Disassembler. https://ghidra-sre.org/.Google Scholar
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In 2015 10th Joint Meeting on Foundations of Software Engineering.Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. Comput. Surveys ( 2018 ).Google Scholar
Anil Altinay, Joseph Nash, Taddeus Kroes, Prabhu Rajasekaran, Dixin Zhou, Adrian Dabrowski, David Gens, Yeoul Na, Stijn Volckaert, Cristiano Giufrida, et al. 2020. BinRec: dynamic binary lifting and recompilation. In Fifteenth European Conference on Computer Systems.Google ScholarDigital Library
Jong-hoon An, Avik Chaudhuri, Jefrey S Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices ( 2011 ).Google Scholar
Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In European conference on Object-oriented programming.Google ScholarDigital Library
Dennis Andriesse, Asia Slowinska, and Herbert Bos. 2017. Compiler-agnostic function detection in binaries. In 2017 IEEE European Symposium on Security and Privacy.Google ScholarCross Ref
Gogul Balakrishnan and Thomas Reps. 2004. Analyzing memory accesses in x86 executables. In International conference on compiler construction.Google ScholarCross Ref
Gogul Balakrishnan and Thomas Reps. 2007. Divine: Discovering variables in executables. In International Workshop on Verification, Model Checking, and Abstract Interpretation.Google ScholarCross Ref
Tifany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. 2014. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Security Symposium.Google Scholar
Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track.Google ScholarDigital Library
Eli Bendersky. [n.d.]. PYEFLTOOLS. https://github.com/eliben/pyelftools.Google Scholar
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2015. Programming with" big code": Lessons, techniques and applications. In 1st Summit on Advances in Programming Languages.Google Scholar
David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. 2011. BAP: A binary analysis platform. In International Conference on Computer Aided Verification.Google ScholarCross Ref
Juan Caballero, Noah M Johnson, Stephen McCamant, and Dawn Song. 2010. Binary Code Extraction and Interface Identification for Security Applications. In 2010 Network and Distributed System Security Symposium.Google Scholar
Juan Caballero and Zhiqiang Lin. 2016. Type inference on executables. Comput. Surveys ( 2016 ).Google Scholar
Juan Caballero, Pongsin Poosankam, Christian Kreibich, and Dawn Song. 2009. Dispatcher: Enabling active botnet infiltration using automatic protocol reverseengineering. In 16th ACM conference on Computer and communications security.Google ScholarDigital Library
Xi Chen, Asia Slowinska, Dennis Andriesse, Herbert Bos, and Cristiano Giufrida. 2015. StackArmor: Comprehensive protection from stack-based memory error vulnerabilities for binaries. In 2015 Network and Distributed System Security Symposium.Google ScholarCross Ref
Mihai Christodorescu, Nicholas Kidd, and Wen-Han Goh. 2005. String analysis for x86 binaries. In 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering.Google ScholarDigital Library
Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural nets can learn function type signatures from binaries. In 26th USENIX Security Symposium.Google ScholarDigital Library
Ezgi Çiçek, Weihao Qu, Gilles Barthe, Marco Gaboardi, and Deepak Garg. 2019. Bidirectional type checking for relational properties. In 40th ACM SIGPLAN Conference on Programming Language Design and Implementation.Google ScholarDigital Library
Anthony Cozzie, Frank Stratton, Hui Xue, and Samuel T King. 2008. Digging for Data Structures. In 2008 USENIX Symposium on Operating Systems Design and Implementation.Google Scholar
Loris d'Antoni, Marco Gaboardi, Emilio Jesús Gallego Arias, Andreas Haeberlen, and Benjamin Pierce. 2013. Sensitivity analysis using type-based constraints. In 1st annual workshop on Functional programming concepts in domain-specific languages.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
David Dewey and Jonathon T Gifin. 2012. Static detection of C++ vtable escape vulnerabilities in binary code. In 2012 Network and Distributed System Security Symposium.Google Scholar
EN Dolgova and AV Chernov. 2009. Automatic reconstruction of data types in the decompilation problem. Programming and Computer Software ( 2009 ).Google Scholar
Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. 2013. Scalable variable and data type detection in a binary rewriter. In 34th ACM SIGPLAN conference on Programming language design and implementation.Google ScholarDigital Library
MV Emmerik and Trent Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering.Google ScholarCross Ref
Michael D Ernst. 2003. Static and dynamic analysis: Synergy and duality. In 2003 International Conference on Software Engineering Workshop on Dynamic Analysis. 24-27.Google Scholar
Reza Mirzazade Farkhani, Saman Jafari, Sajjad Arshad, William Robertson, Engin Kirda, and Hamed Okhravi. 2018. On the efectiveness of type-based control lfow integrity. In 34th Annual Computer Security Applications Conference.Google ScholarDigital Library
Alexander Fokin, Katerina Troshina, and Alexander Chernov. 2010. Reconstruction of class hierarchies for decompilation of C++ programs. In 2010 14th European Conference on Software Maintenance and Reengineering.Google ScholarDigital Library
Michael Furr, Jong-hoon An, Jefrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In 2009 ACM symposium on Applied Computing.Google ScholarDigital Library
Patrice Godefroid. 2014. Micro execution. In 36th International Conference on Software Engineering.Google ScholarDigital Library
Neville Grech, Bernd Fischer, and Julian Rathke. 2018. Preemptive type checking. Journal of logical and algebraic methods in programming ( 2018 ).Google Scholar
Neville Grech, Julian Rathke, and Bernd Fischer. 2013. Preemptive type checking in dynamically typed languages. In International Colloquium on Theoretical Aspects of Computing.Google Scholar
Philip J Guo, Jef H Perkins, Stephen McCamant, and Michael D Ernst. 2006. Dynamic inference of abstract types. In 2006 international symposium on Software testing and analysis.Google ScholarDigital Library
Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Cristiano Giufrida, Herbert Bos, and Erik Van Der Kouwe. 2016. TypeSan: Practical type confusion detection. In 2016 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
Istvan Haller, Asia Slowinska, and Herbert Bos. 2013. Mempick: High-level data structure detection in C/C++ binaries. In 2013 20th Working Conference on Reverse Engineering.Google Scholar
Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMTbased type inference for Python 3. In International Conference on Computer Aided Verification.Google ScholarCross Ref
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, and Martin Vechev. 2018. DEBIN: Predicting debug information in stripped binaries. In 2018 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering.Google Scholar
Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with code fragments. In 21st USENIX Security Symposium.Google ScholarDigital Library
Md Nahid Hossain, Junao Wang, Ofir Weisse, R Sekar, Daniel Genkin, Boyuan He, Scott D Stoller, Gan Fang, Frank Piessens, Evan Downing, et al. 2018. Dependence-preserving data compaction for scalable forensic analysis. In 27th USENIX Security Symposium.Google ScholarDigital Library
Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In International Static Analysis Symposium.Google ScholarDigital Library
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. 2018. Shaping program repair space with existing patches and similar code. In 27th ACM SIGSOFT international symposium on software testing and analysis.Google ScholarDigital Library
Wuxia Jin, Yuanfang Cai, Rick Kazman, Gang Zhang, Qinghua Zheng, and Ting Liu. 2020. Exploring the Architectural Impact of Possible Dependencies in Python Software. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering.Google Scholar
Wesley Jin, Cory Cohen, Jefrey Gennari, Charles Hines, Sagar Chaki, Arie Gurfinkel, Jefrey Havrilla, and Priya Narasimhan. 2014. Recovering C+ + objects from binaries using inter-procedural data-flow analysis. In Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop 2014.Google ScholarDigital Library
Changhee Jung and Nathan Clark. 2009. DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture.Google ScholarDigital Library
Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining. In 33rd Conference on Neural Information Processing Systems.Google Scholar
Yann LeCun, Patrice Y Simard, and Barak Pearlmutter. 1993. Automatic learning rate maximization by on-line estimation of the hessian's eigenvectors. In 1993 Advances in Neural Information Processing System.Google Scholar
JongHyup Lee, Thanassis Avgerinos, and David Brumley. 2011. TIE: Principled reverse engineering of types in binary programs. In 2011 Network and Distributed System Security Symposium.Google Scholar
Junghee Lim, Thomas Reps, and Ben Liblit. 2006. Extracting output formats from executables. In 2006 13th Working Conference on Reverse Engineering.Google ScholarDigital Library
Yan Lin and Debin Gao. 2021. When Function Signature Recovery Meets Compiler Optimization. In 2021 IEEE Symposium on Security and Privacy.Google Scholar
Zhiqiang Lin, Xuxian Jiang, Dongyan Xu, and Xiangyu Zhang. 2008. Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution. In 2008 Network and Distributed System Security Symposium.Google Scholar
Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2010. Automatic reverse engineering of data structures from binary execution. In 2010 Network and Distributed System Security Symposium.Google Scholar
Kui Liu, Dongsun Kim, Tegawendé F Bissyandé, Taeyoung Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to spot and refactor inconsistent method names. In 2019 IEEE/ACM 41st International Conference on Software Engineering.Google ScholarDigital Library
Kangjie Lu and Hong Hu. 2019. Where does it go? refining indirect-call targets with multi-layer type analysis. In 2019 ACM SIGSAC Conference on Computer and Communications Security.Google ScholarDigital Library
Andreas Madsen and Alexander Rosenberg Johansen. 2020. Neural Arithmetic Units. In International Conference on Learning Representations.Google Scholar
Alwin Maier, Hugo Gascon, Christian Wressnegger, and Konrad Rieck. 2019. TypeMiner: Recovering types in binary programs using machine learning. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment.Google ScholarCross Ref
Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering.Google ScholarDigital Library
James Martens and Ilya Sutskever. 2011. Learning recurrent neural networks with hessian-free optimization. In 28th international conference on machine learning.Google Scholar
Kenneth Miller, Yonghwi Kwon, Yi Sun, Zhuo Zhang, Xiangyu Zhang, and Zhiqiang Lin. 2019. Probabilistic disassembly. In 41st International Conference on Software Engineering.Google ScholarDigital Library
Lili Mou, Ge Li, Lu Zhang, Tao Wang, and Zhi Jin. 2016. Convolutional neural networks over tree structures for programming language processing. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Hanne Riis Nielson and Flemming Nielson. 1992. Semantics with applications. Vol. 104. Springer.Google Scholar
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. Fairseq: A fast, extensible toolkit for sequence modeling. In 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations.Google ScholarCross Ref
Chengbin Pang, Ruotong Yu, Yaohui Chen, Eric Koskinen, Georgios Portokalidis, Bing Mao, and Jun Xu. 2021. SoK: All You Ever Wanted to Know About x86/x64 Binary Disassembly But Were Afraid to Ask. In 2021 IEEE Symposium on Security and Privacy.Google ScholarCross Ref
Kexin Pei, Jonas Guan, David Williams-King, Junfeng Yang, and Suman Jana. 2021. XDA: Accurate, Robust Disassembly with Transfer Learning. In 2021 Network and Distributed System Security Symposium.Google Scholar
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv preprint arXiv: 2012. 08680 ( 2020 ).Google Scholar
Hao Peng, Lili Mou, Ge Li, Yuxuan Liu, Lu Zhang, and Zhi Jin. 2015. Building program vector representations for deep learning. In International Conference on Knowledge Science, Engineering and Management.Google ScholarDigital Library
G. D. Plotkin. 1981. A Structural Approach to Operational Semantics. University of Aarhus ( 1981 ).Google Scholar
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. Typewriter: Neural type prediction with search-based validation. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google ScholarDigital Library
Michael Pradel, Parker Schuh, and Koushik Sen. 2015. TypeDevil: Dynamic type inconsistency analysis for JavaScript. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.Google ScholarCross Ref
Michael Pradel and Koushik Sen. 2015. The good, the bad, and the ugly: An empirical study of implicit type conversions in JavaScript. In 29th European Conference on Object-Oriented Programming.Google Scholar
Aravind Prakash, Heng Yin, and Zhenkai Liang. 2013. Enforcing system-wide control flow integrity for exploit detection and diagnosis. In 8th ACM SIGSAC symposium on Information, computer and communications security.Google ScholarDigital Library
Nguyen Anh Quynh. 2014. Capstone: Next-gen disassembly framework. Black Hat USA ( 2014 ).Google Scholar
NGUYEN Anh Quynh and DANG Hoang Vu. 2015. Unicorn: Next Generation CPU Emulator Framework. BlackHat USA ( 2015 ).Google Scholar
Easwaran Raman and David I August. 2005. Recursive data structure profiling. In 2005 workshop on Memory system performance.Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from “big code”. ACM SIGPLAN Notices ( 2015 ).Google Scholar
Brianna M Ren, John Toman, T Stephen Strickland, and Jefrey S Foster. 2013. The ruby type checker. In 28th Annual ACM Symposium on Applied Computing.Google ScholarDigital Library
Martin P Robillard, Eric Bodden, David Kawrykow, Mira Mezini, and Tristan Ratchford. 2012. Automated API property inference techniques. IEEE Transactions on Software Engineering ( 2012 ).Google Scholar
Hex-Rays SA. 2008. IDA Pro Disassembler.Google Scholar
Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Audrey Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2016. SoK: (State of) The Art of War: Ofensive Techniques in Binary Analysis. In 2016 IEEE Symposium on Security and Privacy.Google ScholarCross Ref
Asia Slowinska, Traian Stancescu, and Herbert Bos. 2011. Howard: A Dynamic Excavator for Reverse Engineering Data Structures. In 2011 Network and Distributed System Security Symposium.Google Scholar
Venkatesh Srinivasan and Thomas Reps. 2014. Recovery of class hierarchies and composition relationships from machine code. In International Conference on Compiler Construction.Google ScholarCross Ref
Binary Ninja Team. 2015. Binary Ninja-A new type of reversing platform. https://binary.ninja/.Google Scholar
Radare2 Team. 2017. Radare2 GitHub repository. https://github.com/radare/ radare2.Google Scholar
David Trabish, Timotej Kapus, Noam Rinetzky, and Cristian Cadar. 2020. Pastsensitive pointer analysis for symbolic execution. In 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.Google Scholar
Andrew Trask, Felix Hill, Scott E Reed, Jack Rae, Chris Dyer, and Phil Blunsom. 2018. Neural arithmetic logic units. In Advances in Neural Information Processing Systems. 8035-8044.Google Scholar
David Urbina, Yufei Gu, Juan Caballero, and Zhiqiang Lin. 2014. Sigpath: A memory graph based approach for program data introspection and modification. In European Symposium on Research in Computer Security.Google ScholarDigital Library
Muhammad Usman, Wenxi Wang, Kaiyuan Wang, Cagdas Yelen, Nima Dini, and Sarfraz Khurshid. 2020. A study of learning likely data structure properties using machine learning models. International Journal on Software Tools for Technology Transfer ( 2020 ).Google Scholar
Victor Van Der Veen, Enes Göktas, Moritz Contag, Andre Pawoloski, Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athanasopoulos, and Cristiano Giufrida. 2016. A tough call: Mitigating advanced code-reuse attacks at the binary level. In 2016 IEEE Symposium on Security and Privacy.Google ScholarCross Ref
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering clear, natural identifiers from obfuscated JS names. In 2017 11th joint meeting on foundations of software engineering.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In 2017 Advances in Neural Information Processing Systems.Google Scholar
Yaza Wainakh, Moiz Rauf, and Michael Pradel. 2021. IdBench: Evaluating Semantic Representations of Identifier Names in Source Code. In 2021 IEEE/ACM 43rd International Conference on Software Engineering.Google Scholar
Xiaoyin Wang, Lu Zhang, Tao Xie, Hong Mei, and Jiasu Sun. 2009. Locating need-to-translate constant strings for software internationalization. In 2009 IEEE 31st International Conference on Software Engineering.Google ScholarDigital Library
Richard Wartell, Vishwath Mohan, Kevin W Hamlen, and Zhiqiang Lin. 2012. Securing untrusted code via compiler-agnostic binary rewriting. In 28th Annual Computer Security Applications Conference.Google ScholarDigital Library
Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. Lambdanet: Probabilistic type inference using graph neural networks. In 2020 International Conference on Learning Representations.Google Scholar
Paul J Werbos. 1990. Backpropagation through time: what it does and how to do it. IEEE 78, 10 ( 1990 ), 1550-1560.Google Scholar
David Williams-King, Hidenori Kobayashi, Kent Williams-King, Graham Patterson, Frank Spano, Yu Jian Wu, Junfeng Yang, and Vasileios P. Kemerlis. 2020. Egalito: Layout-Agnostic Binary Recompilation. In 25th International Conference on Architectural Support for Programming Languages and Operating Systems.Google Scholar
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In 24th ACM SIGSOFT international symposium on foundations of software engineering.Google ScholarDigital Library
Dongrui Zeng and Gang Tan. 2018. From Debugging-Information Based BinaryLevel Type Inference to CFG Generation. In Eighth ACM Conference on Data and Application Security and Privacy.Google ScholarDigital Library
Junyuan Zeng, Yangchun Fu, Kenneth A Miller, Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. 2013. Obfuscation resilient binary code reuse through traceoriented programming. In 2013 ACM SIGSAC conference on Computer & communications security.Google ScholarDigital Library
Chao Zhang, Chengyu Song, Kevin Zhijie Chen, Zhaofeng Chen, and Dawn Song. 2015. VTint: Protecting Virtual Function Tables' Integrity. In 2015 Network and Distributed System Security Symposium.Google Scholar
Naville Zhang. 2017. Hikari-an improvement over Obfuscator-LLVM. https: //github.com/HikariObfuscator/Hikari.Google Scholar

Index Terms

StateFormer: fine-grained type recovery from binaries using generative state modeling
1. Computing methodologies
  1. Machine learning
2. Security and privacy
  1. Software and application security
    1. Software reverse engineering

Recommendations

Principal Type Schemes for Gradual Programs
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Gradual typing is a discipline for integrating dynamic checking into a static type system. Since its introduction in functional languages, it has been adapted to a variety of type systems, including object-oriented, security, and substructural. This ...
Read More
Deriving a complete type inference for Hindley-Milner and vector sizes using expansion

Type inference and program analysis both infer static properties about a program. Yet, they are constructed using very different techniques. We reconcile both approaches by deriving a type inference from a denotational semantics using abstract ...
Read More
Type checking and inference for polymorphic and existential types
CATS '09: Proceedings of the Fifteenth Australasian Symposium on Computing: The Australasian Theory - Volume 94

This paper proves undecidability of type checking and type inference problems in some variants of typed lambda calculi with polymorphic and existential types. First, type inference in the domain-free polymorphic lambda calculus is proved to be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2021
1690 pages
ISBN:9781450385626
DOI:10.1145/3468264
General Chairs:
Diomidis Spinellis
Athens University of Economics and Business, Greece
,
Georgios Gousios
Facebook, Netherlands / Delft University of Technology, Netherlands
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Massimiliano Di Penta
University of Sannio, Italy
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available / v1.1
- Artifacts Evaluated & Reusable / v1.1
Author Tags
Machine Learning for Program Analysis
Reverse Engineering
Transfer Learning
Type Inference
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 1,130
  Total Downloads
- Downloads (Last 12 months)681
- Downloads (Last 6 weeks)99
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

StateFormer: fine-grained type recovery from binaries using generative state modeling

ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Principal Type Schemes for Gradual Programs

Deriving a complete type inference for Hindley-Milner and vector sizes using expansion

Type checking and inference for polymorphic and existential types