Abstract
Malware detection has become a hot research pot as the development of Internet of Things and edge computing have grown in popularity. Specifically, various malware exploits firmware vulnerabilities on hardware platform, resulting in significant financial losses for both IoT users and edge platform providers. In this paper, we propose CodeDiff, a fresh approach for malware vulnerability detection on IoT and edge computing platforms based on the binary file similarity detection. CodeDiff is an unsupervised learning method that employs both semantic and structural information for binary diffing and does not require label data. Through the SkipGram with Negative Sampling, we generate the word vocabulary for instruction data. The Graph AutoEncoder is then used to embed both the semantic and structure information into the representation matrix for the CFG. After this, we employ the Improved Graph AutoEncoder to fuse all the function structures, function characteristics and function features to the fusion matrix. Finally, we propose the specific matrix comparison to achieve the high accuracy similarity results in short amount of time. Furthermore, we test the prototype on binary datasets OpenSSL and Curl. The results reveal that CodeDiff gives high performance on the binary file similarity detection, which contributes to identify malware vulnerability and improves the security of Internet of Things platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: IEEE Symposium on Security and Privacy 2015, pp. 709–724 (2015)
Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290 (2009)
Tian, J., Xing, W., Li, Z.: BVDetector: a program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 123, 106289 (2020)
Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS, vol. 52, pp. 58–79 (2016)
Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016)
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88625-9_16
Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 92–109. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_8
Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: IEEE Symposium on Security and Privacy (SP), pp. 472–489 (2019)
Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15
Luo, Z., Wang, B., Tang, Y., Xie, W.: Semantic-based representation binary clone detection for cross-architectures in the internet of things. Appl. Sci. 9(16), 3283 (2019)
Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)
Church, K.W.: Word2Vec. Nat. Lang. Eng. 23(1), 155–162 (2017)
Eagle, C.: The IDA Pro Book. No Starch Press (2011)
Andriesse, D.: Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly. No Starch Press (2018)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 427–440 (2012)
Hetherington, I.L.: A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding (1995)
L. DigitalOcean: ODA - the online disassembler, November 2021. https://www.onlinedisassembler.com
Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)
Blajer, J.A.G.W., Krawczyk, M.: The inverse simulation study of aircraft flight path reconstruction. Transport 17(3), 103–107 (2002)
Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407 (2018)
Kusner, M.J., Paige, B., Hernández-Lobato, J.M.: Grammar variational autoencoder. In: International Conference on Machine Learning, pp. 1945–1954 (2017)
T. O. P. Authors: OpenSSL, November 2019. https://www.openssl.org/
Cooper, K.D., Torczon, L.: Engineering a Compiler. Elsevier, New York (2011)
Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899 (2018)
Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Pearson Education (2009)
Cui, A., Costello, M., Stolfo, S.: When firmware modifications attack: a case study of embedded exploitation (2013)
Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (Grant No. 62072453, 61972392) and the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 2020164).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, K. et al. (2022). CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform. In: Wang, L., Segal, M., Chen, J., Qiu, T. (eds) Wireless Algorithms, Systems, and Applications. WASA 2022. Lecture Notes in Computer Science, vol 13473. Springer, Cham. https://doi.org/10.1007/978-3-031-19211-1_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-19211-1_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19210-4
Online ISBN: 978-3-031-19211-1
eBook Packages: Computer ScienceComputer Science (R0)