Skip to main content

Robust Network-Based Binary-to-Vector Encoding for Scalable IoT Binary File Retrieval

  • Conference paper
  • First Online:
Wireless Algorithms, Systems, and Applications (WASA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10874))

  • 4399 Accesses

Abstract

The goal of IoT binary file retrieval is to retrieve homologous binary files from a large IoT binary file database. Binary file retrieval has many applications, such as security analysis, OEM detection and plagiarism detection. However, traditional string-based approaches are hard to retrieve binary file which contains few or obfuscated strings. To solve this problem, we propose a novel neural network-based approach for encoding binary file into numerical vector based on non-string binary features. Moreover, by using this encoding method, the retrieval task can be accelerated by locality-sensitive hashing technique. For network training and testing, we compile 893 open source components into 71,129 labeled binary file pairs by using 16 different compilation configurations. We implement a prototype called B2V and compare it with IHB, a string-based approach, on both original and string obfuscated testing sets. The results show that the AUC of B2V is better than IHB (0.94 vs. 0.81) on the string obfuscated testing set, while still keeps comparable performance with IHB on the original testing set. Moreover, B2V can be easily retrained to adapt to string obfuscated scenarios with 15%–20% performance improvement. In the interest of open science, we also make our dataset publicly available to seed future improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hemel, A., Kalleberg, K.T., Vermaas, R., Dolstra, E.: Finding software license violations through binary code clone detection, pp. 63–72 (2011)

    Google Scholar 

  2. Jhi, Y.C., Jia, X., Wang, X., Zhu, S., Liu, P., Wu, D.: Program characterization using runtime values and its application to software plagiarism detection. IEEE Trans. Softw. Eng. 41(9), 925–943 (2015)

    Article  Google Scholar 

  3. Hemel, A., Coughlan, S.: BAT: binary analysis toolkit. http://www.binaryanalysis.org/en/home

  4. Chen, Y., Li, H., Zhao, W., Zhang, L., Liu, Z., Shi, Z.: IHB: a scalable and efficient scheme to identify homologous binaries in IoT firmwares. In: 2017 IEEE 36th International Performance Computing and Communications Conference, IPCCC (2017)

    Google Scholar 

  5. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, pp. 518–529 (1999)

    Google Scholar 

  6. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  7. Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: a survey (2017)

    Google Scholar 

  8. Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  9. Chang, Q., Liu, Z., Wang, M., Chen, Y., Shi, Z., Sun, L.: VDNS: an algorithm for cross-platform vulnerability searching in binary firmware. J. Comput. Res. Dev. (2016)

    Google Scholar 

  10. Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491. ACM (2016)

    Google Scholar 

  11. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection (2017)

    Google Scholar 

  12. Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3(3), 91–97 (2006)

    Article  Google Scholar 

  13. Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) DigitalForensics 2010. IAICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15

    Chapter  Google Scholar 

  14. Li, Y., Sundaramurthy, S.C., Bardas, A.G., Ou, X., Caragea, D., Hu, X., Jang, J.: Experimental study of fuzzy hashing in malware clustering analysis. In: USENIX Conference on Cyber Security Experimentation and Test, p. 8 (2015)

    Google Scholar 

  15. Bass, L., Brown, N., Cahill, G.M., Casey, W., Chaki, S., Cohen, C., Niz, D.D., French, D., Gurfinkel, A., Kazman, R.: Results of SEI line-funded exploratory new starts projects (2012)

    Google Scholar 

  16. Costin, A., Zaddach, J., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: USENIX Conference on Security Symposium, pp. 95–110 (2014)

    Google Scholar 

  17. Cai, Z., Zheng, X.: A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Trans. Netw. Sci. Eng. (2018)

    Google Scholar 

  18. Liang, Y., Cai, Z., Yu, J., Han, Q., Li, Y.: Deep learning based inference of private information using embedded sensors in smart devices. IEEE Netw. Mag. (2018)

    Google Scholar 

  19. Zheng, X., Cai, Z., Li, Y.: Data linkage in smart IoT systems: a consideration from privacy perspective. IEEE Commun. Mag. (2018)

    Google Scholar 

  20. Hu, C., Li, R., Mei, B., Li, W., Alrawais, A., Bie, R.: Privacy-preserving combinatorial auction without an auctioneer. EURASIP J. Wirel. Commun. Netw. 2018(1), 38 (2018)

    Article  Google Scholar 

  21. Li, J., Cheng, J., Shi, J., Huang, F.: Brief introduction of back propagation (BP) neural network algorithm and its improvement. In: Jin, D., Lin, S. (eds.) Advances in Computer Science and Information Engineering. AINSC, vol. 169, pp. 553–558. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30223-7_87

    Chapter  Google Scholar 

  22. Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: USENIX Conference on Security Symposium, pp. 611–626 (2015)

    Google Scholar 

  23. Chua, Z.L., Shen, S., Saxena, P., Liang, Z.: Neural nets can learn function type signatures from binaries. In: USENIX Conference on Security Symposium (2017)

    Google Scholar 

Download references

Acknowledgment

This work was supported by National Key Research and Development Program of China (2016YFB0800202); National Natural Science Foundation of China under Grants No. U1636120; Fundamental Theory and Cutting Edge Technology Research Program of Institute of Information Engineering, CAS; SKLOIS (No. Y7Z0361104 and No. Y7Z0311104); Key Program of National Natural Science Foundation of China (U1766215); Key Research Program of Chinese MIIT under Grant No. JCKY2016602B001; Beijing Municipal Science & Technology Commission Grants No. Z161100002616032; The Science and Technology Project of State Grid Corporation of China (No. 52110418001K).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Li, H., Ma, Y., Shi, Z., Sun, L. (2018). Robust Network-Based Binary-to-Vector Encoding for Scalable IoT Binary File Retrieval. In: Chellappan, S., Cheng, W., Li, W. (eds) Wireless Algorithms, Systems, and Applications. WASA 2018. Lecture Notes in Computer Science(), vol 10874. Springer, Cham. https://doi.org/10.1007/978-3-319-94268-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94268-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94267-4

  • Online ISBN: 978-3-319-94268-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics