ELISA: ELiciting ISA of Raw Binaries for Fine-Grained Code and Data Separation

De Nicolao, Pietro; Pogliani, Marcello; Polino, Mario; Carminati, Michele; Quarta, Davide; Zanero, Stefano

doi:10.1007/978-3-319-93411-2_16

Pietro De Nicolao¹⁶,
Marcello Pogliani¹⁶,
Mario Polino¹⁶,
Michele Carminati¹⁶,
Davide Quarta¹⁶ &
…
Stefano Zanero¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10885))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2104 Accesses

Abstract

Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from $98.13\%$ to over $99.9\%$ depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over $99.9\%$.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

BinAlign: Alignment Padding Based Compiler Provenance Recovery

The Good, The Bad, and The Missing: A Comprehensive Study on the Rise of Machine Learning for Binary Code Analysis

The Good, the Bad, and the Binary: An LSTM-Based Method for Section Boundary Detection in Firmware Analysis

Notes

1.
While it is possible to integrate our methodology with ISA-dependent heuristics, we show that our methodology achieves good results without ISA-specific knowledge.
2.
The parameters m and n belong to the model and can be appropriately tuned; for example, in our evaluation we used grid search.
3.
http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SimpleLogistic.html.
4.
http://security.ece.cmu.edu/byteweight/.
5.
We determine that $C=10000$ is the optimal value through grid search optimization.
6.
http://www.arduino.org/products/boards/arduino-uno.
7.
https://github.com/BinaryAnalysisPlatform/arm-binaries.

References

Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1
Chapter Google Scholar
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37
Chapter Google Scholar
Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A., Grosen, J., Feng, S., Hauser, C., Kruegel, C., Vigna, G.: Sok: (state of) the art of war: offensive techniques in binary analysis. In: Proceedings of 2016 IEEE Symposium on Security and Privacy, SP, pp. 138–157 (2016)
Google Scholar
Shoshitaishvili, Y., Wang, R., Hauser, C., Kruegel, C., Vigna, G.: Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In: Proceedings of 2015 Network and Distributed System Security Symposium, NDSS (2015)
Google Scholar
Haller, I., Slowinska, A., Neugschwandtner, M., Bos, H.: Dowsing for overflows: a guided fuzzer to find buffer boundary violations. In: Proceedings of 22nd USENIX Security Symposium, USENIX Security 2013, pp. 49–64 (2013)
Google Scholar
Corina, J., Machiry, A., Salls, C., Shoshitaishvili, Y., Hao, S., Kruegel, C., Vigna, G.: Difuze: interface aware fuzzing for kernel drivers. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 2123–2138 (2017)
Google Scholar
Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J., Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of 2016 Network and Distributed System Security Symposium, NDSS, vol. 16, pp. 1–16 (2016)
Google Scholar
Cova, M., Felmetsger, V., Banks, G., Vigna, G.: Static detection of vulnerabilities in x86 executables. In: Proceedings of 22nd Annual Computer Security Applications Conference, ACSAC, pp. 269–278. IEEE (2006)
Google Scholar
Kolsek, M.: Did microsoft just manually patch their equation editor executable? Why yes, yes they did. (cve-2017-11882) (2017). https://0patch.blogspot.com/2017/11/did-microsoft-just-manually-patch-their.html
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 522–536. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_34
Chapter Google Scholar
Andriesse, D., Chen, X., van der Veen, V., Slowinska, A., Bos, H.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: Proceedings of 25th USENIX Security Symposium, USENIX Security 2016, pp. 583–600 (2016)
Google Scholar
Andriesse, D., Slowinska, A., Bos, H.: Compiler-agnostic function detection in binaries. In: Proceedings of 2017 IEEE European Symposium on Security and Privacy, Euro S&P, pp. 177–189. IEEE (2017)
Google Scholar
Chen, J.Y., Shen, B.Y., Ou, Q.H., Yang, W., Hsu, W.C.: Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator. In: Proceedings of 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2013, pp. 1–10 (2013)
Google Scholar
Clemens, J.: Automatic classification of object code using machine learning. Digit. Investig. 14, S156–S162 (2015)
Article Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, pp. 25–32 (2004)
Google Scholar
Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate Frank-Wolfe optimization for structural SVMs. In: Proceedings of 30th International Conference on Machine Learning, ICML 2013, pp. 53–61 (2013)
Google Scholar
Müller, A.C., Behnke, S.: PyStruct - learning structured prediction in python. J. Mach. Learn. Res. 15, 2055–2060 (2014)
MathSciNet MATH Google Scholar
Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of 21st Annual Symposium on Parallelism in algorithms and architectures, SPAA 2009, pp. 233–244. ACM (2009)
Google Scholar
Arduino: Built-In Examples. https://www.arduino.cc/en/Tutorial/BuiltInExamples
NVIDIA: CUDA Samples. http://docs.nvidia.com/cuda/cuda-samples/index.html
Legitimate Business Syndicate: The cLEMENCy Architecture (2017). https://blog.legitbs.net/2017/07/the-clemency-architecture.html
Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: ByteWeight: learning to recognize functions in binary code. In: Proceedings of 23rd USENIX Security Symposium, pp. 845–860 (2014)
Google Scholar
Karampatziakis, N.: Static analysis of binary executables using structural SVMs. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1063–1071. Curran Associates, Inc. (2010)
Google Scholar
Microsoft: Universal Windows Platform (UWP) app samples. https://github.com/Microsoft/Windows-universal-samples
Microsoft: Dia2dump sample. https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/dia2dump-sample
Eager, M.J.: Introduction to the DWARF debugging format (2012). http://www.dwarfstd.org/doc/Debugging
Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of 10th ACM Conference on Computer and Communications Security, CCS 2003, pp. 290–299. ACM (2003)
Google Scholar
Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of 13th USENIX Security Symposium (2004)
Google Scholar
Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of 23th AAAI Conference on Artificial Intelligence, AAAI 2008, pp. 798–804. AAAI Press (2008)
Google Scholar
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: Proceedings of 24th USENIX Security Symposium, pp. 611–626 (2015)
Google Scholar
McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of 36th Annual Hawaii International Conference on System Sciences (2003)
Google Scholar
Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings of the 6th Annual IEEE SMC Information Assurance Workshop, IAW 2005, pp. 64–71. IEEE (2005)
Google Scholar
Sportiello, L., Zanero, S.: Context-based file block classification. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2012. IAICT, vol. 383, pp. 67–82. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33962-2_5
Chapter Google Scholar
Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digit. Investig. 10(4), 372–384 (2013)
Article Google Scholar
Granboulan, L.: cpu_rec: Recognize cpu instructions in an arbitrary binary file (2017). https://github.com/airbus-seclab/cpu_rec
Oberhumer, M.F., Molnár, L., Reiser, J.F.: UPX: the Ultimate Packer for eXecutables. https://upx.github.io/

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their suggestions that led to improving this work. This project has been supported by the Italian Ministry of University and Research under the FIRB project FACE (Formal Avenue for Chasing malwarE), grant agreement nr. RBFR13AJFT; and by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement nr. 690972.

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Pietro De Nicolao, Marcello Pogliani, Mario Polino, Michele Carminati, Davide Quarta & Stefano Zanero

Authors

Pietro De Nicolao
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Pogliani
View author publications
You can also search for this author in PubMed Google Scholar
Mario Polino
View author publications
You can also search for this author in PubMed Google Scholar
Michele Carminati
View author publications
You can also search for this author in PubMed Google Scholar
Davide Quarta
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Zanero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pietro De Nicolao .

Editor information

Editors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Cristiano Giuffrida
CEA, Palaiseau, France
Sébastien Bardin
Université Paris-Saclay, Evry, France
Gregory Blanc

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Nicolao, P., Pogliani, M., Polino, M., Carminati, M., Quarta, D., Zanero, S. (2018). ELISA: ELiciting ISA of Raw Binaries for Fine-Grained Code and Data Separation. In: Giuffrida, C., Bardin, S., Blanc, G. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2018. Lecture Notes in Computer Science(), vol 10885. Springer, Cham. https://doi.org/10.1007/978-3-319-93411-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-93411-2_16
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93410-5
Online ISBN: 978-3-319-93411-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics