Skip to main content

On the Decidability of Disassembling Binaries

  • Conference paper
  • First Online:
Theoretical Aspects of Software Engineering (TASE 2024)

Abstract

The general consensus is that disassembly of binaries is undecidable. The cause lies in distinguishing instructions from data, and resolving indirections. Furthermore, binaries can behave in “weird” ways which have no counterpart in assembly languages, e.g., instructions may overlap, or use other instructions as data. Yet, the general consensus is that, for a large part of production binaries, disassembly works sufficiently well for the use cases at hand. This paper aims to address the question: for which binaries is disassembly decidable? For which binaries can disassembly become decidable if an external oracle, e.g., provides the set of instruction addresses, or resolves indirections? We present a set of five theorems on decidability of disassembly; each theorem corresponding to a use case. All five theorems are accompanied by a proof of correctness based on bisimilarity between the input binary and the output assembly program, and have been formalized in the Isabelle/HOL theorem prover.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://dwarfstd.org.

References

  1. An, X., Verbeek, F., Ravindran, B.: DSV: disassembly soundness validation without assuming a ground truth. In: Deshmukh, J.V., Havelund, K., Perez, I. (eds.) NFM 2022. LNCS, vol. 13260, pp. 636–655. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06773-0_34

    Chapter  MATH  Google Scholar 

  2. Anckaert, B., Madou, M., De Bosschere, K.: A model for self-modifying code, vol. 4437, pp. 232–248 (2006). https://doi.org/10.1007/978-3-540-74124-4_16

  3. Baier, C., Katoen, J.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  4. Balakrishnan, G., Gruian, R., Reps, T., Teitelbaum, T.: CodeSurfer/x86—a platform for analyzing x86 executables. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 250–254. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31985-6_19

    Chapter  Google Scholar 

  5. Balakrishnan, G., et al.: Model checking x86 executables with CodeSurfer/x86 and WPDS++. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 158–163. Springer, Heidelberg (2005). https://doi.org/10.1007/11513988_17

    Chapter  MATH  Google Scholar 

  6. Bangert, J., et al.: The \(\{\)Page-Fault\(\}\) weird machine: lessons in instruction-less computation. In: 7th USENIX Workshop on Offensive Technologies (2013)

    Google Scholar 

  7. Bonfante, G., Marion, J.Y., Reynaud-Plantey, D.: A computability perspective on self-modifying programs. In: Seventh IEEE International Conference on Software Engineering and Formal Methods, pp. 231–239 (2009). https://doi.org/10.1109/SEFM.2009.25

  8. Bonfante, G., et al.: CoDisasm: medium scale concatic disassembly of self-modifying binaries with overlapping instructions. In: 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 745–756. Association for Computing Machinery (2015). https://doi.org/10.1145/2810103.2813627

  9. Cifuentes, C.: Reverse compilation techniques. Queensland University of Technology, Brisbane (1994)

    Google Scholar 

  10. Cohen, F.: Computer viruses: theory and experiments. Comput. Secur. 6(1), 22–35 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  11. Collberg, C., Thomborson, C.: Watermarking, tamper-proofing, and obfuscation - tools for software protection. IEEE Trans. Software Eng. 28(8), 735–746 (2002). https://doi.org/10.1109/TSE.2002.1027797

    Article  MATH  Google Scholar 

  12. David, R., et al.: BINSEC/SE: a dynamic symbolic execution toolkit for binary-level analysis. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering, vol. 1, pp. 653–656. IEEE (2016). https://doi.org/10.1109/SANER.2016.43

  13. Duck, G.J., Gao, X., Roychoudhury, A.: Binary rewriting without control flow recovery. In: 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 151–163 (2020). https://doi.org/10.1145/3385412.3385972

  14. He, J., et al.: Debin: predicting debug information in stripped binaries. In: 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667–1680 (2018). https://doi.org/10.1145/3243734.3243866

  15. Kargén, U., et al.: desync-cc: a research tool for automatically applying disassembly desynchronization during compilation. Sci. Comput. Program. 228, 102954 (2023). https://doi.org/10.1016/j.scico.2023.102954

    Article  MATH  Google Scholar 

  16. Kinder, J., Katzenbeisser, S., Schallhart, C., Veith, H.: Detecting malicious code by model checking. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 174–187. Springer, Heidelberg (2005). https://doi.org/10.1007/11506881_11

    Chapter  MATH  Google Scholar 

  17. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: 10th ACM Conference on Computer and Communications Security, pp. 290–299 (2003). https://doi.org/10.1145/948109.948149

  18. Liu, Z., Wang, S.: How far we have come: testing decompilation correctness of c decompilers. In: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 475–487. Association for Computing Machinery (2020). https://doi.org/10.1145/3395363.3397370

  19. Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Signedness-agnostic program analysis: precise integer bounds for low-level code. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS, vol. 7705, pp. 115–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35182-2_9

    Chapter  MATH  Google Scholar 

  20. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic, vol. 2283. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45949-9

  21. Pang, C., et al.: SoK: all you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy, pp. 833–851. IEEE (2021). https://doi.org/10.1109/SP40001.2021.00012

  22. Pei, K., et al.: XDA: accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020). https://doi.org/10.48550/arXiv.2010.00770

  23. Priyadarshan, S., Nguyen, H., Sekar, R.: Accurate disassembly of complex binaries without use of compiler metadata. In: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 4, p. 1–18. Association for Computing Machinery (2024). https://doi.org/10.1145/3623278.3624766

  24. Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Trans. Am. Math. Soc. 74(2), 358–366 (1953). http://www.jstor.org/stable/1990888

  25. Saxena, P., et al.: Loop-extended symbolic execution on binary programs. In: 18th International Symposium on Software Testing and Analysis, pp. 225–236 (2009). https://doi.org/10.1145/1572272.1572299

  26. Schulte, E., et al.: Evolving exact decompilation. In: Workshop on Binary Analysis Research (2018). https://doi.org/10.14722/bar.2018.23008

  27. Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: 2002 Proceedings of the Ninth Working Conference on Reverse Engineering, pp. 45–54. IEEE (2002). https://doi.org/10.1109/WCRE.2002.1173063

  28. Selçuk, A.A., Orhan, F., Batur, B.: Undecidable problems in malware analysis. In: 2017 12th International Conference for Internet Technology and Secured Transactions, pp. 494–497 (2017). https://doi.org/10.23919/ICITST.2017.8356458

  29. Touili, T., Ye, X.: LTL model checking of self modifying code. Formal Methods Syst. Des. 60(2), 195–227 (2022). https://doi.org/10.1007/s10703-022-00394-8

    Article  MATH  Google Scholar 

  30. Verbeek, F., et al.: Formally verified lifting of c-compiled x86-64 binaries. In: Jhala, R., Dillig, I. (eds.) PLDI 2022: 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pp. 934–949. ACM (2022). https://doi.org/10.1145/3519939.3523702

  31. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 522–536. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_34

    Chapter  MATH  Google Scholar 

  32. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M.: Shingled graph disassembly: finding the undecideable path. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 273–285. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_23

    Chapter  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their insightful comments and suggestions, which helped to greatly improve the paper. This work is supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4028, and by DARPA under Agreement No. HR00112090028.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Engel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Engel, D., Verbeek, F., Ravindran, B. (2024). On the Decidability of Disassembling Binaries. In: Chin, WN., Xu, Z. (eds) Theoretical Aspects of Software Engineering. TASE 2024. Lecture Notes in Computer Science, vol 14777. Springer, Cham. https://doi.org/10.1007/978-3-031-64626-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64626-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64625-6

  • Online ISBN: 978-3-031-64626-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics