Abstract
Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from \(98.13\%\) to over \(99.9\%\) depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over \(99.9\%\).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
While it is possible to integrate our methodology with ISA-dependent heuristics, we show that our methodology achieves good results without ISA-specific knowledge.
- 2.
The parameters m and n belong to the model and can be appropriately tuned; for example, in our evaluation we used grid search.
- 3.
- 4.
- 5.
We determine that \(C=10000\) is the optimal value through grid search optimization.
- 6.
- 7.
References
Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37
Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A., Grosen, J., Feng, S., Hauser, C., Kruegel, C., Vigna, G.: Sok: (state of) the art of war: offensive techniques in binary analysis. In: Proceedings of 2016 IEEE Symposium on Security and Privacy, SP, pp. 138–157 (2016)
Shoshitaishvili, Y., Wang, R., Hauser, C., Kruegel, C., Vigna, G.: Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In: Proceedings of 2015 Network and Distributed System Security Symposium, NDSS (2015)
Haller, I., Slowinska, A., Neugschwandtner, M., Bos, H.: Dowsing for overflows: a guided fuzzer to find buffer boundary violations. In: Proceedings of 22nd USENIX Security Symposium, USENIX Security 2013, pp. 49–64 (2013)
Corina, J., Machiry, A., Salls, C., Shoshitaishvili, Y., Hao, S., Kruegel, C., Vigna, G.: Difuze: interface aware fuzzing for kernel drivers. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 2123–2138 (2017)
Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J., Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of 2016 Network and Distributed System Security Symposium, NDSS, vol. 16, pp. 1–16 (2016)
Cova, M., Felmetsger, V., Banks, G., Vigna, G.: Static detection of vulnerabilities in x86 executables. In: Proceedings of 22nd Annual Computer Security Applications Conference, ACSAC, pp. 269–278. IEEE (2006)
Kolsek, M.: Did microsoft just manually patch their equation editor executable? Why yes, yes they did. (cve-2017-11882) (2017). https://0patch.blogspot.com/2017/11/did-microsoft-just-manually-patch-their.html
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 522–536. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_34
Andriesse, D., Chen, X., van der Veen, V., Slowinska, A., Bos, H.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: Proceedings of 25th USENIX Security Symposium, USENIX Security 2016, pp. 583–600 (2016)
Andriesse, D., Slowinska, A., Bos, H.: Compiler-agnostic function detection in binaries. In: Proceedings of 2017 IEEE European Symposium on Security and Privacy, Euro S&P, pp. 177–189. IEEE (2017)
Chen, J.Y., Shen, B.Y., Ou, Q.H., Yang, W., Hsu, W.C.: Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator. In: Proceedings of 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2013, pp. 1–10 (2013)
Clemens, J.: Automatic classification of object code using machine learning. Digit. Investig. 14, S156–S162 (2015)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, pp. 25–32 (2004)
Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate Frank-Wolfe optimization for structural SVMs. In: Proceedings of 30th International Conference on Machine Learning, ICML 2013, pp. 53–61 (2013)
Müller, A.C., Behnke, S.: PyStruct - learning structured prediction in python. J. Mach. Learn. Res. 15, 2055–2060 (2014)
Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of 21st Annual Symposium on Parallelism in algorithms and architectures, SPAA 2009, pp. 233–244. ACM (2009)
Arduino: Built-In Examples. https://www.arduino.cc/en/Tutorial/BuiltInExamples
NVIDIA: CUDA Samples. http://docs.nvidia.com/cuda/cuda-samples/index.html
Legitimate Business Syndicate: The cLEMENCy Architecture (2017). https://blog.legitbs.net/2017/07/the-clemency-architecture.html
Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: ByteWeight: learning to recognize functions in binary code. In: Proceedings of 23rd USENIX Security Symposium, pp. 845–860 (2014)
Karampatziakis, N.: Static analysis of binary executables using structural SVMs. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1063–1071. Curran Associates, Inc. (2010)
Microsoft: Universal Windows Platform (UWP) app samples. https://github.com/Microsoft/Windows-universal-samples
Microsoft: Dia2dump sample. https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/dia2dump-sample
Eager, M.J.: Introduction to the DWARF debugging format (2012). http://www.dwarfstd.org/doc/Debugging
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of 10th ACM Conference on Computer and Communications Security, CCS 2003, pp. 290–299. ACM (2003)
Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of 13th USENIX Security Symposium (2004)
Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of 23th AAAI Conference on Artificial Intelligence, AAAI 2008, pp. 798–804. AAAI Press (2008)
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: Proceedings of 24th USENIX Security Symposium, pp. 611–626 (2015)
McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of 36th Annual Hawaii International Conference on System Sciences (2003)
Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings of the 6th Annual IEEE SMC Information Assurance Workshop, IAW 2005, pp. 64–71. IEEE (2005)
Sportiello, L., Zanero, S.: Context-based file block classification. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2012. IAICT, vol. 383, pp. 67–82. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33962-2_5
Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digit. Investig. 10(4), 372–384 (2013)
Granboulan, L.: cpu_rec: Recognize cpu instructions in an arbitrary binary file (2017). https://github.com/airbus-seclab/cpu_rec
Oberhumer, M.F., Molnár, L., Reiser, J.F.: UPX: the Ultimate Packer for eXecutables. https://upx.github.io/
Acknowledgements
We would like to thank the anonymous reviewers for their suggestions that led to improving this work. This project has been supported by the Italian Ministry of University and Research under the FIRB project FACE (Formal Avenue for Chasing malwarE), grant agreement nr. RBFR13AJFT; and by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement nr. 690972.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
De Nicolao, P., Pogliani, M., Polino, M., Carminati, M., Quarta, D., Zanero, S. (2018). ELISA: ELiciting ISA of Raw Binaries for Fine-Grained Code and Data Separation. In: Giuffrida, C., Bardin, S., Blanc, G. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2018. Lecture Notes in Computer Science(), vol 10885. Springer, Cham. https://doi.org/10.1007/978-3-319-93411-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-93411-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93410-5
Online ISBN: 978-3-319-93411-2
eBook Packages: Computer ScienceComputer Science (R0)