Skip to main content
Log in

Bidirectional Grammars for Machine-Code Decoding and Encoding

  • Published:
Journal of Automated Reasoning Aims and scope Submit manuscript

Abstract

Binary analysis, which analyzes machine code, requires a decoder for converting bits into abstract syntax of machine instructions. Binary rewriting requires an encoder for converting instructions to bits. We propose a domain-specific language that enables the specification of both decoding and encoding in a single bidirectional grammar. With dependent types, a bigrammar enables the extraction of an executable decoder and encoder as well as a correctness proof showing their consistency. The bigrammar DSL is embedded in Coq with machine-checked proofs. We have used the bigrammar DSL to specify the decoding and encoding of subsets of the x86-32 and MIPS instruction sets. We have also extracted an executable decoder and encoder from the x86 bigrammar with competitive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. NaCl on x86-32 restricts both memory reads and writes through hardware segmentation; NaCl on x86-64 and ARM, on the other hand, instruments only memory writes and does not protect confidentiality.

  2. Henceforth, we just write x86 for its 32-bit version.

  3. The x86 bigrammar is under x86model/Model and the MIPS bigrammar under MIPSmodel. For the x86model, the bigrammar DSL is in BiGrammar.v and an x86 bigrammar is in X86BG.v and X86BG_ins.v.

  4. Our parser implementation actually returns a list of values during parsing, for simplicity of presentation we ignore that aspect in this paper.

  5. That is, “ret s” is defined as \({\mathtt {Some}}\ {s}\); and “\(s_1 \leftarrow v; f\)” is defined as \(\mathtt {match}\ v\ \mathtt {with}\ |\ {\mathtt {None}}\Rightarrow {\mathtt {None}}\ |\ {\mathtt {Some}}\ {s_1} \Rightarrow f\ s_1\ \mathtt {end}\).

  6. Their definitions can be found in x86model/Model/BiGrammar.v in our online GitHub code repository.

  7. We also performed a performance comparison between our decoder and Linux’s objdump. The derivative-based x86 decoder introduced in Rocksalt [21] was about 100 times slower than objdump. We recently made improvements to produce a table-based decoder from the x86 grammar; its performance is within 2x compared to objdump.

References

  1. Abadi, M., Budiu, M., Erlingsson, Ú., Ligatti, J.: Control-flow integrity. In: 12th ACM Conference on Computer and Communications Security (CCS), pp. 340–353 (2005)

  2. Alimarine, A., Smetsers, S., van Weelden, A., van Eekelen, M.C.J.D., Plasmeijer, R.: There and back again: arrows for invertible programming. In: Proceedings of the ACM SIGPLAN Workshop on Haskell, pp. 86–97 (2005)

  3. Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., Tawbi, N.: Static detection of malicious code in executable programs. Int. J. Requir. Eng. (2001)

  4. Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang: resourceful lenses for string data. In: 35th ACM Symposium on Principles of Programming Languages (POPL), pp. 407–419 (2008)

  5. Brabrand, C., Møller, A., Schwartzbach, M.I.: Dual syntax for XML languages. In: 10th International Symposium on Database Programming Languages (DBPL), pp. 27–41 (2005)

  6. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Computer Aided Verification (CAV), pp. 463–469 (2011)

  7. Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11, 481–494 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  8. Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: 12th Usenix Security Symposium, pp. 169–186 (2003)

  9. The Coq proof assistant. https://coq.inria.fr/

  10. Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: 31st ACM Symposium on Principles of Programming Languages (POPL), pp. 111–122 (2004)

  11. Fox, A.C.J.: Improved tool support for machine-code decompilation in HOL4. In: 6th International Conference on Interactive Theorem Proving (ITP), pp. 187–202 (2015)

  12. Godefroid, P., Levin, M.Y., Molnar, D.A.: Automated whitebox fuzz testing. In: Network and Distributed System Security Symposium (NDSS) (2008)

  13. Jansson, P., Jeuring, J.: Polytypic compact printing and parsing. In: 8th European Symposium on Programming (ESOP), pp. 273–287 (1999)

  14. Jourdan, J.H., Pottier, F., Leroy, X.: Validating LR(1) parsers. In: European Symposium on Programming (ESOP), pp. 397–416 (2012)

  15. Kawanaka, S., Hosoya, H.: biXid: a bidirectional transformation language for XML. In: ACM International Conference on Functional programming (ICFP), pp. 201–214 (2006)

  16. Kennedy, A.: Pickler combinators. J. Funct. Program. 14(6), 727–739 (2004)

    Article  MATH  Google Scholar 

  17. Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010, pp. 547–557 (2010)

  18. Kroll, J., Dean, D.: BakerSFIeld: Bringing software fault isolation to x64. www.jkroll.com/papers/bakersfield_sfi.pdf

  19. McCamant, S., Morrisett, G.: Evaluating SFI for a CISC architecture. In: 15th Usenix Security Symposium (2006)

  20. Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: ACM International Conference on Functional programming (ICFP), pp. 189–195 (2011)

  21. Morrisett, G., Tan, G., Tassarotti, J., Tristan, J.B., Gan, E.: Rocksalt: better, faster, stronger SFI for the x86. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 395–404 (2012)

  22. Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19, 173–190 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ramsey, N., Fernández, M.F.: Specifying representations of machine instructions. ACM Trans. Program. Lang. Syst. 19(3), 492–524 (1997)

    Article  Google Scholar 

  24. Rendel, T., Ostermann, K.: Invertible syntax descriptions: unifying parsing and pretty printing. In: Third ACM Haskell Symposium on Haskell, pp. 1–12 (2010)

  25. Reps, T., Lim, J., Thakur, A., Balakrishnan, G., Lal, A.: There’s plenty of room at the bottom: analyzing and verifying machine code. In: Computer Aided Verification (CAV), pp. 41–56 (2010)

  26. Sepp, A., Kranz, J., Simon, A.: GDSL: a generic decoder specification language for interpreting machine language. In: Tools for Automatic Program Analysis, pp. 53–64 (2012)

  27. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: A new approach to computer security via binary analysis. In: Proceedings of the 4th International Conference on Information Systems Security (2008)

  28. Tan, G., Morrisett, G.: Bidirectional grammars for machine-code decoding and encoding. In: 8th International Conference on Verified Software: Theories, Tools, and Experiments (VSTTE), pp. 73–89 (2016)

  29. Wahbe, R., Lucco, S., Anderson, T., Graham, S.: Efficient software-based fault isolation. In: ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 203–216. ACM Press, New York (1993)

  30. Wartell, R., Mohan, V., Hamlen, K.W., Lin, Z.: Securing untrusted code via compiler-agnostic binary rewriting. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 299–308 (2012)

  31. Xu, Z., Miller, B., Reps, T.: Safety checking of machine code. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 70–82 (2000)

  32. Yee, B., Sehr, D., Dardyk, G., Chen, B., Muth, R., Ormandy, T., Okasaka, S., Narula, N., Fullagar, N.: Native client: A sandbox for portable, untrusted x86 native code. In: IEEE Symposium on Security and Privacy (S&P) (2009)

Download references

Acknowledgements

We thank the anonymous reviewers for their constructive comments. This research is supported by NSF Grants CCF-1624124 and CNS-1624126. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any of the above organizations or any person connected with them.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Tan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tan, G., Morrisett, G. Bidirectional Grammars for Machine-Code Decoding and Encoding. J Autom Reasoning 60, 257–277 (2018). https://doi.org/10.1007/s10817-017-9429-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10817-017-9429-1

Keywords

Navigation