Abstract
Binary code analysis serves as the foundation for research in vulnerability discovery, software protection, and malicious code analysis. However, analyzing binary files is challenging due to the lack of high-level semantic information, leading to heavy dependence on analysts’ expertise and significantly impacting the efficiency of binary code analysis. Recent years has witnessed the blossom of machine learning models for binary analysis, but few researches address the problem of binary code datasets. In this paper, we review all the existing and available datasets, and make classification according to their application. We set up experiments to illustrate how dataset quality could affect the performance of machine learning models in binary function recognition. Based on the experimental evaluation, we present a discussion on the ground truth as well as quality evaluation problems for binary code datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
vulmon. https://vulmon.com/
Vulnerability database and search engine (2015). https://vulners.com/
Darpa challenge binaries on linux os x and windows (2016). https://github.com/trailofbits/cb-multios
Alves-Foss, J., Venugopal, V.: The inconvenient truths of ground truth for binary analysis, January 2022. https://doi.org/10.14722/bar.2022.23010
Anderson, H.S., Roth, P.: Ember: an open dataset for training static pe malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Andriesse, D., Chen, X., Van Der Veen, V., Slowinska, A., Bos, H.: An \(\{\)In-Depth\(\}\) analysis of disassembly on \(\{\)Full-Scale\(\}\) x86/x64 binaries. In: 25th USENIX security symposium (USENIX security 16), pp. 583–600 (2016)
Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: \(\{\)BYTEWEIGHT\(\}\): Learning to recognize functions in binary code. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 845–860 (2014)
Black, P.E., Black, P.E.: Juliet 1.3 test suite: Changes from 1.2. US Department of Commerce, National Institute of Standards and Technology (2018)
Bucek, J., Lange, K.D., v. Kistowski, J.: Spec cpu2017: next-generation compute benchmark. In: Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, pp. 41–42 (2018)
Chen, P., Chen, H.: Angora: efficient fuzzing by principled search. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 711–725. IEEE (2018)
Corporation, S.P.E.: SPEC CPU2017 Benchmark (2017). https://www.spec.org/cpu2017
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Dolan-Gavitt, B., et al.: Lava: large-scale automated vulnerability addition. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 110–121. IEEE (2016)
Fan, J., Li, Y., Wang, S., Nguyen, T.N.: Ac/c++ code vulnerability dataset with code changes and cve summaries. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 508–512 (2020)
Guo, W., Mu, D., Xu, J., Su, P., Wang, G., Xing, X.: Lemna: explaining deep learning based security applications. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 364–379 (2018)
Hagan, M.T., Demuth, H.B., Beale, M.: Neural network design. PWS Publishing Co. (1997)
Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)
Koo, H., Park, S., Kim, T.: A look back on a function identification problem, pp. 158–168 (2021)
Le, T., et al.: Maximal divergence sequential autoencoder for binary software vulnerability detection. In: International Conference on Learning Representations (2019)
Li, Z., et al.: Vuldeepecker: a deep learning-based system for vulnerability detection. arXiv preprint arXiv:1801.01681 (2018)
Liu, Z., He, K.: A decade’s battle on dataset bias: Are we there yet? arXiv preprint arXiv:2403.08632 (2024)
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)
Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15
Nguyen, V., et al.: Code action network for binary function scope identification. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 712–725. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_55
Pang, C., Yu, R., Chen, Y., Koskinen, E., Portokalidis, G., Mao, B., Xu, J.: Sok: all you ever wanted to know about x86/x64 binary disassembly but were afraid to ask. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 833–851. IEEE (2021)
Paullada, A., Raji, I.D., Bender, E.M., Denton, E., Hanna, A.: Data and its (dis) contents: a survey of dataset development and use in machine learning research. Patterns 2(11) (2021)
Pei, K., Guan, J., Williams-King, D., Yang, J., Jana, S.: Xda: accurate, robust disassembly with transfer learning. arXiv preprint arXiv:2010.00770 (2020)
Roberts, J.M.: Virus share (2011). https://virusshare.com
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., Ahmadi, M.: Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135 (2018)
Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX Security Symposium (USENIX Security 15), pp. 611–626 (2015)
Standard Performance Evaluation Corporation: SPEC CPU2006 Benchmark (2006). https://www.spec.org/cpu2006
Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. IEEE Computer Society (2017)
Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011, pp. 1521–1528. IEEE (2011)
Wilander, J., Nikiforakis, N., Younan, Y., Kamkar, M., Joosen, W.: RIPE: runtime intrusion prevention evaluator. In: Proceedings of the 27th Annual Computer Security Applications Conference, ACSAC. ACM (2011)
Yu, S., Qu, Y., Hu, X., Yin, H.: \(\{\)DeepDi\(\}\): learning a relational graph convolutional network model on instructions for fast and accurate disassembly. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 2709–2725 (2022)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. ArXiv abs/1909.03496 (2019), https://api.semanticscholar.org/CorpusID:202539112
Zou, D., Wang, S., Xu, S., Li, Z., Jin, H.: \(\mu \)vuldeepecker: a deep learning-based system for multiclass vulnerability detection. IEEE Trans. Dependable Secure Comput. 18(5), 2224–2236 (2021). https://doi.org/10.1109/TDSC.2019.2942930
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, Z., Song, S., Liu, H., Kuang, H., Zhang, J., Hu, P. (2025). A Review on Binary Code Analysis Datasets. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14999. Springer, Cham. https://doi.org/10.1007/978-3-031-71470-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-71470-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71469-6
Online ISBN: 978-3-031-71470-2
eBook Packages: Computer ScienceComputer Science (R0)