Abstract
The goal of IoT binary file retrieval is to retrieve homologous binary files from a large IoT binary file database. Binary file retrieval has many applications, such as security analysis, OEM detection and plagiarism detection. However, traditional string-based approaches are hard to retrieve binary file which contains few or obfuscated strings. To solve this problem, we propose a novel neural network-based approach for encoding binary file into numerical vector based on non-string binary features. Moreover, by using this encoding method, the retrieval task can be accelerated by locality-sensitive hashing technique. For network training and testing, we compile 893 open source components into 71,129 labeled binary file pairs by using 16 different compilation configurations. We implement a prototype called B2V and compare it with IHB, a string-based approach, on both original and string obfuscated testing sets. The results show that the AUC of B2V is better than IHB (0.94 vs. 0.81) on the string obfuscated testing set, while still keeps comparable performance with IHB on the original testing set. Moreover, B2V can be easily retrained to adapt to string obfuscated scenarios with 15%–20% performance improvement. In the interest of open science, we also make our dataset publicly available to seed future improvements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hemel, A., Kalleberg, K.T., Vermaas, R., Dolstra, E.: Finding software license violations through binary code clone detection, pp. 63–72 (2011)
Jhi, Y.C., Jia, X., Wang, X., Zhu, S., Liu, P., Wu, D.: Program characterization using runtime values and its application to software plagiarism detection. IEEE Trans. Softw. Eng. 41(9), 925–943 (2015)
Hemel, A., Coughlan, S.: BAT: binary analysis toolkit. http://www.binaryanalysis.org/en/home
Chen, Y., Li, H., Zhao, W., Zhang, L., Liu, Z., Shi, Z.: IHB: a scalable and efficient scheme to identify homologous binaries in IoT firmwares. In: 2017 IEEE 36th International Performance Computing and Communications Conference, IPCCC (2017)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: International Conference on Very Large Data Bases, pp. 518–529 (1999)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Goyal, P., Ferrara, E.: Graph embedding techniques, applications, and performance: a survey (2017)
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)
Chang, Q., Liu, Z., Wang, M., Chen, Y., Shi, Z., Sun, L.: VDNS: an algorithm for cross-platform vulnerability searching in binary firmware. J. Comput. Res. Dev. (2016)
Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491. ACM (2016)
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection (2017)
Kornblum, J.: Identifying almost identical files using context triggered piecewise hashing. Digit. Investig. 3(3), 91–97 (2006)
Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.-P., Shenoi, S. (eds.) DigitalForensics 2010. IAICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15
Li, Y., Sundaramurthy, S.C., Bardas, A.G., Ou, X., Caragea, D., Hu, X., Jang, J.: Experimental study of fuzzy hashing in malware clustering analysis. In: USENIX Conference on Cyber Security Experimentation and Test, p. 8 (2015)
Bass, L., Brown, N., Cahill, G.M., Casey, W., Chaki, S., Cohen, C., Niz, D.D., French, D., Gurfinkel, A., Kazman, R.: Results of SEI line-funded exploratory new starts projects (2012)
Costin, A., Zaddach, J., Balzarotti, D.: A large-scale analysis of the security of embedded firmwares. In: USENIX Conference on Security Symposium, pp. 95–110 (2014)
Cai, Z., Zheng, X.: A private and efficient mechanism for data uploading in smart cyber-physical systems. IEEE Trans. Netw. Sci. Eng. (2018)
Liang, Y., Cai, Z., Yu, J., Han, Q., Li, Y.: Deep learning based inference of private information using embedded sensors in smart devices. IEEE Netw. Mag. (2018)
Zheng, X., Cai, Z., Li, Y.: Data linkage in smart IoT systems: a consideration from privacy perspective. IEEE Commun. Mag. (2018)
Hu, C., Li, R., Mei, B., Li, W., Alrawais, A., Bie, R.: Privacy-preserving combinatorial auction without an auctioneer. EURASIP J. Wirel. Commun. Netw. 2018(1), 38 (2018)
Li, J., Cheng, J., Shi, J., Huang, F.: Brief introduction of back propagation (BP) neural network algorithm and its improvement. In: Jin, D., Lin, S. (eds.) Advances in Computer Science and Information Engineering. AINSC, vol. 169, pp. 553–558. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30223-7_87
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: USENIX Conference on Security Symposium, pp. 611–626 (2015)
Chua, Z.L., Shen, S., Saxena, P., Liang, Z.: Neural nets can learn function type signatures from binaries. In: USENIX Conference on Security Symposium (2017)
Acknowledgment
This work was supported by National Key Research and Development Program of China (2016YFB0800202); National Natural Science Foundation of China under Grants No. U1636120; Fundamental Theory and Cutting Edge Technology Research Program of Institute of Information Engineering, CAS; SKLOIS (No. Y7Z0361104 and No. Y7Z0311104); Key Program of National Natural Science Foundation of China (U1766215); Key Research Program of Chinese MIIT under Grant No. JCKY2016602B001; Beijing Municipal Science & Technology Commission Grants No. Z161100002616032; The Science and Technology Project of State Grid Corporation of China (No. 52110418001K).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Chen, Y., Li, H., Ma, Y., Shi, Z., Sun, L. (2018). Robust Network-Based Binary-to-Vector Encoding for Scalable IoT Binary File Retrieval. In: Chellappan, S., Cheng, W., Li, W. (eds) Wireless Algorithms, Systems, and Applications. WASA 2018. Lecture Notes in Computer Science(), vol 10874. Springer, Cham. https://doi.org/10.1007/978-3-319-94268-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-94268-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94267-4
Online ISBN: 978-3-319-94268-1
eBook Packages: Computer ScienceComputer Science (R0)