Abstract
The general consensus is that disassembly of binaries is undecidable. The cause lies in distinguishing instructions from data, and resolving indirections. Furthermore, binaries can behave in “weird” ways which have no counterpart in assembly languages, e.g., instructions may overlap, or use other instructions as data. Yet, the general consensus is that, for a large part of production binaries, disassembly works sufficiently well for the use cases at hand. This paper aims to address the question: for which binaries is disassembly decidable? For which binaries can disassembly become decidable if an external oracle, e.g., provides the set of instruction addresses, or resolves indirections? We present a set of five theorems on decidability of disassembly; each theorem corresponding to a use case. All five theorems are accompanied by a proof of correctness based on bisimilarity between the input binary and the output assembly program, and have been formalized in the Isabelle/HOL theorem prover.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
An, X., Verbeek, F., Ravindran, B.: DSV: disassembly soundness validation without assuming a ground truth. In: Deshmukh, J.V., Havelund, K., Perez, I. (eds.) NFM 2022. LNCS, vol. 13260, pp. 636–655. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06773-0_34
Anckaert, B., Madou, M., De Bosschere, K.: A model for self-modifying code, vol. 4437, pp. 232–248 (2006). https://doi.org/10.1007/978-3-540-74124-4_16
Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)
Balakrishnan, G., Gruian, R., Reps, T., Teitelbaum, T.: CodeSurfer/x86—a platform for analyzing x86 executables. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 250–254. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31985-6_19
Balakrishnan, G., et al.: Model checking x86 executables with CodeSurfer/x86 and WPDS++. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 158–163. Springer, Heidelberg (2005). https://doi.org/10.1007/11513988_17
Bangert, J., et al.: The \(\{\)Page-Fault\(\}\) weird machine: lessons in instruction-less computation. In: 7th USENIX Workshop on Offensive Technologies (2013)
Bonfante, G., Marion, J.Y., Reynaud-Plantey, D.: A computability perspective on self-modifying programs. In: Seventh IEEE International Conference on Software Engineering and Formal Methods, pp. 231–239 (2009). https://doi.org/10.1109/SEFM.2009.25
Bonfante, G., et al.: CoDisasm: medium scale concatic disassembly of self-modifying binaries with overlapping instructions. In: 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 745–756. Association for Computing Machinery (2015). https://doi.org/10.1145/2810103.2813627
Cifuentes, C.: Reverse compilation techniques. Queensland University of Technology, Brisbane (1994)
Cohen, F.: Computer viruses: theory and experiments. Comput. Secur. 6(1), 22–35 (1987)
Collberg, C., Thomborson, C.: Watermarking, tamper-proofing, and obfuscation - tools for software protection. IEEE Trans. Software Eng. 28(8), 735–746 (2002). https://doi.org/10.1109/TSE.2002.1027797
David, R., et al.: BINSEC/SE: a dynamic symbolic execution toolkit for binary-level analysis. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, vol. 1, pp. 653–656. IEEE (2016). https://doi.org/10.1109/SANER.2016.43
Duck, G.J., Gao, X., Roychoudhury, A.: Binary rewriting without control flow recovery. In: 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 151–163 (2020). https://doi.org/10.1145/3385412.3385972
He, J., et al.: Debin: predicting debug information in stripped binaries. In: 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667–1680 (2018). https://doi.org/10.1145/3243734.3243866
Kargén, U., et al.: desync-cc: a research tool for automatically applying disassembly desynchronization during compilation. Sci. Comput. Program. 228, 102954 (2023). https://doi.org/10.1016/j.scico.2023.102954
Kinder, J., Katzenbeisser, S., Schallhart, C., Veith, H.: Detecting malicious code by model checking. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 174–187. Springer, Heidelberg (2005). https://doi.org/10.1007/11506881_11
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: 10th ACM Conference on Computer and Communications Security, pp. 290–299 (2003). https://doi.org/10.1145/948109.948149
Liu, Z., Wang, S.: How far we have come: testing decompilation correctness of c decompilers. In: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 475–487. Association for Computing Machinery (2020). https://doi.org/10.1145/3395363.3397370
Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Signedness-agnostic program analysis: precise integer bounds for low-level code. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS, vol. 7705, pp. 115–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35182-2_9
Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic, vol. 2283. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45949-9
Pang, C., et al.: SoK: all you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy, pp. 833–851. IEEE (2021). https://doi.org/10.1109/SP40001.2021.00012
Pei, K., et al.: XDA: accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020). https://doi.org/10.48550/arXiv.2010.00770
Priyadarshan, S., Nguyen, H., Sekar, R.: Accurate disassembly of complex binaries without use of compiler metadata. In: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 4, p. 1–18. Association for Computing Machinery (2024). https://doi.org/10.1145/3623278.3624766
Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Trans. Am. Math. Soc. 74(2), 358–366 (1953). http://www.jstor.org/stable/1990888
Saxena, P., et al.: Loop-extended symbolic execution on binary programs. In: 18th International Symposium on Software Testing and Analysis, pp. 225–236 (2009). https://doi.org/10.1145/1572272.1572299
Schulte, E., et al.: Evolving exact decompilation. In: Workshop on Binary Analysis Research (2018). https://doi.org/10.14722/bar.2018.23008
Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: 2002 Proceedings of the Ninth Working Conference on Reverse Engineering, pp. 45–54. IEEE (2002). https://doi.org/10.1109/WCRE.2002.1173063
Selçuk, A.A., Orhan, F., Batur, B.: Undecidable problems in malware analysis. In: 2017 12th International Conference for Internet Technology and Secured Transactions, pp. 494–497 (2017). https://doi.org/10.23919/ICITST.2017.8356458
Touili, T., Ye, X.: LTL model checking of self modifying code. Formal Methods Syst. Des. 60(2), 195–227 (2022). https://doi.org/10.1007/s10703-022-00394-8
Verbeek, F., et al.: Formally verified lifting of c-compiled x86-64 binaries. In: Jhala, R., Dillig, I. (eds.) PLDI 2022: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 934–949. ACM (2022). https://doi.org/10.1145/3519939.3523702
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 522–536. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_34
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M.: Shingled graph disassembly: finding the undecideable path. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 273–285. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_23
Acknowledgements
We would like to thank the anonymous reviewers for their insightful comments and suggestions, which helped to greatly improve the paper. This work is supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4028, and by DARPA under Agreement No. HR00112090028.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Engel, D., Verbeek, F., Ravindran, B. (2024). On the Decidability of Disassembling Binaries. In: Chin, WN., Xu, Z. (eds) Theoretical Aspects of Software Engineering. TASE 2024. Lecture Notes in Computer Science, vol 14777. Springer, Cham. https://doi.org/10.1007/978-3-031-64626-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-64626-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64625-6
Online ISBN: 978-3-031-64626-3
eBook Packages: Computer ScienceComputer Science (R0)