CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform

Wang, Kang; Yan, Longchuan; Chu, Zihao; Guo, Yonghe; Liu, Yongji; Cui, Lei; Hao, Zhiyu

doi:10.1007/978-3-031-19211-1_42

Kang Wang^11,12,
Longchuan Yan¹³,
Zihao Chu¹⁴,
Yonghe Guo¹³,
Yongji Liu¹¹,
Lei Cui¹¹ &
…
Zhiyu Hao¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13473))

Included in the following conference series:

International Conference on Wireless Algorithms, Systems, and Applications

1580 Accesses

Abstract

Malware detection has become a hot research pot as the development of Internet of Things and edge computing have grown in popularity. Specifically, various malware exploits firmware vulnerabilities on hardware platform, resulting in significant financial losses for both IoT users and edge platform providers. In this paper, we propose CodeDiff, a fresh approach for malware vulnerability detection on IoT and edge computing platforms based on the binary file similarity detection. CodeDiff is an unsupervised learning method that employs both semantic and structural information for binary diffing and does not require label data. Through the SkipGram with Negative Sampling, we generate the word vocabulary for instruction data. The Graph AutoEncoder is then used to embed both the semantic and structure information into the representation matrix for the CFG. After this, we employ the Improved Graph AutoEncoder to fuse all the function structures, function characteristics and function features to the fusion matrix. Finally, we propose the specific matrix comparison to achieve the high accuracy similarity results in short amount of time. Furthermore, we test the prototype on binary datasets OpenSSL and Curl. The results reveal that CodeDiff gives high performance on the binary file similarity detection, which contributes to identify malware vulnerability and improves the security of Internet of Things platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

VulMAE: Graph Masked Autoencoders for Vulnerability Detection from Source and Binary Codes

Graph embedding as a new approach for unknown malware detection

Article 19 May 2016

DVul-WLG: Graph Embedding Network Based on Code Similarity for Cross-Architecture Firmware Vulnerability Detection

References

Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: IEEE Symposium on Security and Privacy 2015, pp. 709–724 (2015)
Google Scholar
Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290 (2009)
Google Scholar
Tian, J., Xing, W., Li, Z.: BVDetector: a program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 123, 106289 (2020)
Article Google Scholar
Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS, vol. 52, pp. 58–79 (2016)
Google Scholar
Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016)
Google Scholar
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88625-9_16
Chapter Google Scholar
Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 92–109. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_8
Chapter Google Scholar
Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: IEEE Symposium on Security and Privacy (SP), pp. 472–489 (2019)
Google Scholar
Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15
Chapter Google Scholar
Luo, Z., Wang, B., Tang, Y., Xie, W.: Semantic-based representation binary clone detection for cross-architectures in the internet of things. Appl. Sci. 9(16), 3283 (2019)
Article Google Scholar
Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)
Church, K.W.: Word2Vec. Nat. Lang. Eng. 23(1), 155–162 (2017)
Article Google Scholar
Eagle, C.: The IDA Pro Book. No Starch Press (2011)
Google Scholar
Andriesse, D.: Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly. No Starch Press (2018)
Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 427–440 (2012)
Google Scholar
Hetherington, I.L.: A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding (1995)
Google Scholar
L. DigitalOcean: ODA - the online disassembler, November 2021. https://www.onlinedisassembler.com
Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)
Article Google Scholar
Blajer, J.A.G.W., Krawczyk, M.: The inverse simulation study of aircraft flight path reconstruction. Transport 17(3), 103–107 (2002)
Google Scholar
Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407 (2018)
Kusner, M.J., Paige, B., Hernández-Lobato, J.M.: Grammar variational autoencoder. In: International Conference on Machine Learning, pp. 1945–1954 (2017)
Google Scholar
T. O. P. Authors: OpenSSL, November 2019. https://www.openssl.org/
Cooper, K.D., Torczon, L.: Engineering a Compiler. Elsevier, New York (2011)
Google Scholar
Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899 (2018)
Google Scholar
Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Pearson Education (2009)
Google Scholar
Cui, A., Costello, M., Stolfo, S.: When firmware modifications attack: a case study of embedded exploitation (2013)
Google Scholar
Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant No. 62072453, 61972392) and the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 2020164).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Kang Wang, Yongji Liu, Lei Cui & Zhiyu Hao
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Kang Wang
Information and Telecommunication Branch, State Grid, Beijing, China
Longchuan Yan & Yonghe Guo
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Zihao Chu

Authors

Kang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Longchuan Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zihao Chu
View author publications
You can also search for this author in PubMed Google Scholar
Yonghe Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yongji Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lei Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyu Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongji Liu .

Editor information

Editors and Affiliations

Dalian University of Technology, Dalian, China
Lei Wang
Ben-Gurion University of the Negev, Beer-Sheva, Israel
Michael Segal
Chang Gung University, Taiwan, China
Jenhui Chen
Tianjin University, Tianjin, China
Tie Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, K. et al. (2022). CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform. In: Wang, L., Segal, M., Chen, J., Qiu, T. (eds) Wireless Algorithms, Systems, and Applications. WASA 2022. Lecture Notes in Computer Science, vol 13473. Springer, Cham. https://doi.org/10.1007/978-3-031-19211-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-031-19211-1_42
Published: 17 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19210-4
Online ISBN: 978-3-031-19211-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform