Skip to main content

CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform

  • Conference paper
  • First Online:
Wireless Algorithms, Systems, and Applications (WASA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13473))

  • 1231 Accesses

Abstract

Malware detection has become a hot research pot as the development of Internet of Things and edge computing have grown in popularity. Specifically, various malware exploits firmware vulnerabilities on hardware platform, resulting in significant financial losses for both IoT users and edge platform providers. In this paper, we propose CodeDiff, a fresh approach for malware vulnerability detection on IoT and edge computing platforms based on the binary file similarity detection. CodeDiff is an unsupervised learning method that employs both semantic and structural information for binary diffing and does not require label data. Through the SkipGram with Negative Sampling, we generate the word vocabulary for instruction data. The Graph AutoEncoder is then used to embed both the semantic and structure information into the representation matrix for the CFG. After this, we employ the Improved Graph AutoEncoder to fuse all the function structures, function characteristics and function features to the fusion matrix. Finally, we propose the specific matrix comparison to achieve the high accuracy similarity results in short amount of time. Furthermore, we test the prototype on binary datasets OpenSSL and Curl. The results reveal that CodeDiff gives high performance on the binary file similarity detection, which contributes to identify malware vulnerability and improves the security of Internet of Things platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: IEEE Symposium on Security and Privacy 2015, pp. 709–724 (2015)

    Google Scholar 

  2. Wang, X., Jhi, Y.-C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290 (2009)

    Google Scholar 

  3. Tian, J., Xing, W., Li, Z.: BVDetector: a program slice-based binary code vulnerability intelligent detection system. Inf. Softw. Technol. 123, 106289 (2020)

    Article  Google Scholar 

  4. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS, vol. 52, pp. 58–79 (2016)

    Google Scholar 

  5. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491 (2016)

    Google Scholar 

  6. Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88625-9_16

    Chapter  Google Scholar 

  7. Ming, J., Pan, M., Gao, D.: iBinHunt: binary hunting with inter-procedural control flow. In: Kwon, T., Lee, M.-K., Kwon, D. (eds.) ICISC 2012. LNCS, vol. 7839, pp. 92–109. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37682-5_8

    Chapter  Google Scholar 

  8. Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)

  9. Ding, S.H., Fung, B.C., Charland, P.: Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: IEEE Symposium on Security and Privacy (SP), pp. 472–489 (2019)

    Google Scholar 

  10. Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15

    Chapter  Google Scholar 

  11. Luo, Z., Wang, B., Tang, Y., Xie, W.: Semantic-based representation binary clone detection for cross-architectures in the internet of things. Appl. Sci. 9(16), 3283 (2019)

    Article  Google Scholar 

  12. Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)

  13. Church, K.W.: Word2Vec. Nat. Lang. Eng. 23(1), 155–162 (2017)

    Article  Google Scholar 

  14. Eagle, C.: The IDA Pro Book. No Starch Press (2011)

    Google Scholar 

  15. Andriesse, D.: Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly. No Starch Press (2018)

    Google Scholar 

  16. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  17. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

  18. Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 427–440 (2012)

    Google Scholar 

  19. Hetherington, I.L.: A characterization of the problem of new, out-of-vocabulary words in continuous-speech recognition and understanding (1995)

    Google Scholar 

  20. L. DigitalOcean: ODA - the online disassembler, November 2021. https://www.onlinedisassembler.com

  21. Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)

    Article  Google Scholar 

  22. Blajer, J.A.G.W., Krawczyk, M.: The inverse simulation study of aircraft flight path reconstruction. Transport 17(3), 103–107 (2002)

    Google Scholar 

  23. Pan, S., Hu, R., Long, G., Jiang, J., Yao, L., Zhang, C.: Adversarially regularized graph autoencoder for graph embedding. arXiv preprint arXiv:1802.04407 (2018)

  24. Kusner, M.J., Paige, B., Hernández-Lobato, J.M.: Grammar variational autoencoder. In: International Conference on Machine Learning, pp. 1945–1954 (2017)

    Google Scholar 

  25. T. O. P. Authors: OpenSSL, November 2019. https://www.openssl.org/

  26. Cooper, K.D., Torczon, L.: Engineering a Compiler. Elsevier, New York (2011)

    Google Scholar 

  27. Gao, J., Yang, X., Fu, Y., Jiang, Y., Sun, J.: VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary. In: 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 896–899 (2018)

    Google Scholar 

  28. Nagra, J., Collberg, C.: Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection: Obfuscation, Watermarking, and Tamperproofing for Software Protection. Pearson Education (2009)

    Google Scholar 

  29. Cui, A., Costello, M., Stolfo, S.: When firmware modifications attack: a case study of embedded exploitation (2013)

    Google Scholar 

  30. Martin, A., Raponi, S., Combe, T., Di Pietro, R.: Docker ecosystem-vulnerability analysis. Comput. Commun. 122, 30–43 (2018)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant No. 62072453, 61972392) and the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Grant No. 2020164).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongji Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, K. et al. (2022). CodeDiff: A Malware Vulnerability Detection Tool Based on Binary File Similarity for Edge Computing Platform. In: Wang, L., Segal, M., Chen, J., Qiu, T. (eds) Wireless Algorithms, Systems, and Applications. WASA 2022. Lecture Notes in Computer Science, vol 13473. Springer, Cham. https://doi.org/10.1007/978-3-031-19211-1_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19211-1_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19210-4

  • Online ISBN: 978-3-031-19211-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics