Abstract
Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within extensive code volumes. Despite its promise, DL-based vulnerability detection remains in its early stages, with model performance exhibiting variability across datasets. Drawing insights from other well-explored application areas like computer vision, we conjecture that the imbalance issue (the number of vulnerable code is extremely small) is at the core of the phenomenon. To validate this, we conduct a comprehensive empirical study involving nine open-source datasets and two state-of-the-art DL models. The results confirm our conjecture. We also obtain insightful findings on how existing imbalance solutions perform in vulnerability detection. It turns out that these solutions perform differently as well across datasets and evaluation metrics. Specifically: 1) Focal loss is more suitable to improve the precision, 2) mean false error and class-balanced loss encourages the recall, and 3) random over-sampling facilitates the F1-measure. However, none of them excels across all metrics. To delve deeper, we explore external influences on these solutions and offer insights for developing new solutions.
This work is funded by the European Union’s Horizon Research and Innovation Programme under Grant Agreement n\(^\circ \)101070303.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
Notice: the number of data is a bit different from the original paper in [30] because we remove empty source code files from the provided datasets. Empty files cause compiling bugs and degrade the model performance.
- 5.
- 6.
- 7.
- 8.
References
Amankwah, R., Kudjo, P., Yeboah, S.: Evaluation of software vulnerability detection methods and tools: a review. Int. J. Comput. Appl. 169, 22–27 (2017). https://doi.org/10.5120/ijca2017914750
Arusoaie, A., Ciobâca, S., Craciun, V., Gavrilut, D., Lucanu, D.: A comparison of open-source static analysis tools for vulnerability detection in c/c++ code. In: 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 161–168. IEEE (2017). https://doi.org/10.1109/SYNASC.2017.00035
Asterisk team: Asterisk website (2022). https://www.asterisk.org/. Accessed 25 Aug 2023
Bellard, F.: Qemu wesite (2022). https://www.qemu.org/. Accessed 25 Aug 2023
Bellard, F.: FFmpeg team: Repository of ffmpeg on github (2023). https://github.com/FFmpeg/FFmpeg. Accessed 25 Aug 2023
Bommasani, R., Hudson, D.A., Adeli, E., et al.: On the opportunities and risks of foundation models. CoRR abs/2108.07258 (2021). https://arxiv.org/abs/2108.07258
Brown, T., Mann, B., Ryder, N., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, pp. 1877–1901. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018). https://doi.org/10.1016/j.neunet.2018.07.011
Chakraborty, S., Krishna, R., Ding, Y., Ray, B.: Deep learning based vulnerability detection: are we there yet? IEEE Trans. Softw. Eng. 48(09), 3280–3296 (2022). https://doi.org/10.1109/TSE.2021.3087402
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002). https://doi.org/10.1613/jair.953
Choi, S., Yang, S., Choi, S., Yun, S.: Improving test-time adaptation via shift-agnostic weight regularization and nearest source prototypes. In: Computer Vision - ECCV 2022, pp. 440–458. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_26
Croft, R., Xie, Y., Babar, M.A.: Data preparation for software vulnerability prediction: a systematic literature review. IEEE Trans. Softw. Eng. 49, 1044–1063 (2022). https://doi.org/10.1109/TSE.2022.3171202
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9260–9269. IEEE (2019). https://doi.org/10.1109/CVPR.2019.00949
Devlin, J., Chang, M., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019). https://aclanthology.org/N19-1423.pdf
Drummond, C., Holte, R.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats oversampling. In: International Conference on Machine Learning Workshop on Learning from Imbalanced Data Sets II, Washington, DC, USA (2003). https://www.site.uottawa.ca/~nat/Workshop2003/drummondc.pdf
Fell, J.: A review of fuzzing tools and methods. PenTest Magazine (2017)
Feng, Z., Guo, D., Tang, D., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.139
Garg, A., Degiovanni, R., Jimenez, M., Cordy, M., Papadakis, M., Le Traon, Y.: Learning from what we know: how to perform vulnerability prediction using noisy historical data. Empir. Softw. Eng. 27(7) (2022). https://doi.org/10.1007/s10664-022-10197-4
Guo, D., Ren, S., Lu, S., et al.: Graphcodebert: pre-training code representations with data flow. In: International Conference on Learning Representations (2021). https://openreview.net/pdf?id=jLoC4ez43PZ
Han, X., Zhang, Z., Ding, N., et al.: Pre-trained models: past, present and future. AI Open 2, 225–250 (2021). https://doi.org/10.1016/j.aiopen.2021.08.002
He, H., Ma, Y.: Imbalanced Learning: Foundations, Algorithms, and Applications, 1st edn. Wiley-IEEE Press, Hoboken (2013)
Huang, C.Y., Dai, H.L.: Learning from class-imbalanced data: review of data driven methods and algorithm driven methods. Data Sci. Finan. Econ. 1(1), 21–36 (2021). https://doi.org/10.3934/DSFE.2021002
Husain, H., Wu, H.H., Gazit, T., Allamanis, M., Brockschmidt, M.: Codesearchnet challenge: evaluating the state of semantic code search. CoRR abs/1909.09436 (2019). https://arxiv.org/abs/1909.09436
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: 41st International Conference on Software Engineering, pp. 1039–1049. IEEE Press (2019). https://doi.org/10.1109/ICSE.2019.00108
Koh, P.W., Sagawa, S., Marklund, H., et al.: Wilds: a benchmark of in-the-wild distribution shifts. In: 38th International Conference on Machine Learning, pp. 5637–5664. PMLR (2021)
Li, Z., Zou, D., Tang, J., Zhang, Z., Sun, M., Jin, H.: A comparative study of deep learning-based vulnerability detection system. IEEE Access 7, 103184–103197 (2019). https://doi.org/10.1109/ACCESS.2019.2930578
Li, Z., Zou, D., Xu, S., Jin, H., Zhu, Y., Chen, Z.: Sysevr: a framework for using deep learning to detect software vulnerabilities. IEEE Trans. Depend. Secure Comput. 19(04), 2244–2258 (2022). https://doi.org/10.1109/TDSC.2021.3051525
Li, Z., et al.: Vuldeepecker: a deep learning-based system for vulnerability detection. In: 25th Annual Network and Distributed System Security Symposium. The Internet Society (2018). https://doi.org/10.14722/ndss.2018.23158
Lin, G., Xiao, W., Zhang, J., Xiang, Y.: Deep learning-based vulnerable function detection: a benchmark. In: Zhou, J., Luo, X., Shen, Q., Xu, Z. (eds.) ICICS 2019. LNCS, vol. 11999, pp. 219–232. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41579-2_13
Lin, G., et al.: Cross-project transfer representation learning for vulnerable function discovery. IEEE Trans. Ind. Inf. 14(7), 3289–3297 (2018). https://doi.org/10.1109/TII.2018.2821768
Lin, G., et al.: Repository of lin2018 on github (2019). https://github.com/DanielLin1986/TransferRepresentationLearning. Accessed 25 Aug 2023
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 318–327 (2020). https://doi.org/10.1109/TPAMI.2018.2858826
Liu, Y., Ott, M., Goyal, N., et al.: Roberta: a robustly optimized bert pretraining approach. CoRR abs/1907.11692 (2019). https://arxiv.org/abs/1907.11692
Lu, J., Batra, D., Parikh, D., Lee, S.: Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: 33rd Conference on Neural Information Processing Systems (2019)
Lu, S., Guo, D., Ren, S., Huang, J., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. OpenReview.net (2021). https://openreview.net/forum?id=6lE4dQXaUcb
Lu, S., Guo, D., Ren, S., et al.: Implementation of codexglue. https://github.com/microsoft/CodeXGLUE (2022). Accessed 25 Aug 2023
Mazuera-Rozo, A., Mojica-Hanke, A., Linares-Vásquez, M., Bavota, G.: Shallow or deep? an empirical study on detecting vulnerabilities using deep learning. In: IEEE/ACM 29th International Conference on Program Comprehension, pp. 276–287 (2021). https://doi.org/10.1109/ICPC52881.2021.00034
Mendoza, J., Mycroft, J., Milbury, L., Kahani, N., Jaskolka, J.: On the effectiveness of data balancing techniques in the context of ml-based test case prioritization. In: 18th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 72–81. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3558489.3559073
Pidgin team: Pidgin website (2020). https://pidgin.im/. Accessed 25 Aug 2023
Pinconschi, E.: Repository of devign on github (2020). https://github.com/epicosy/devign. Accessed 25 Aug 2023
Sam Leffler, S.G.: Repository of libtiff on gitlab (2020). https://gitlab.com/libtiff/libtiff. Accessed 25 Aug 2023
Sharma, T., et al.: A survey on machine learning techniques for source code analysis. CoRR abs/2110.09610 (2021). https://arxiv.org/abs/2110.09610
Shen, Z., Chen, S., Coppolino, L.: A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Secur. Commun. Netw. 2020 (2020). https://doi.org/10.1155/2020/8858010
Shu, R., Xia, T., Williams, L., Menzies, T.: Dazzle: using ooptimized generative adversarial networks to address security data class imbalance issue. In: 19th International Conference on Mining Software Repositories, pp. 144–155. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3524842.3528437
Sun, C., Myers, A., Vondrick, C., Murphy, K., Schmid, C.: Videobert: a joint model for video and language representation learning. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7463–7472. IEEE Computer Society, Los Alamitos (2019). https://doi.org/10.1109/ICCV.2019.00756
Truta, C., Randers-Pehrson, G., Dilger, A.E., Schalnat, G.E.: Repository of libpng on github (2023). https://github.com/glennrp/libpng. Accessed 25 Aug 2023
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
VLC team: Vlc media player website (2023). https://github.com/videolan/vlc. Accessed 25 Aug 2023
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.: Training deep neural networks on imbalanced data sets. In: International Joint Conference on Neural Networks, pp. 4368–4374. IEEE (2016). https://doi.org/10.1109/IJCNN.2016.7727770
Yang, Z., Shi, J., He, J., Lo, D.: Natural attack for pre-trained models of code. In: International Conference on Software Engineering, pp. 1482–1493. Association for Computing Machinery (2022). https://doi.org/10.1145/3510003.3510146
You, Y., Zhang, Z., Hsieh, C., Demmel, J.: 100-epoch imagenet training with alexnet in 24 minutes. CoRR abs/1709.05011 (2017). https://arxiv.org/abs/1709.05011
Zhang, H., Li, Z., Li, G., Ma, L., Liu, Y., Jin, Z.: Generating adversarial examples for holding robustness of source code processing models. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1169–1176 (2020). : https://doi.org/10.1609/aaai.v34i01.5469
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks, pp. 10197–10207. Curran Associates Inc., Red Hook (2019)
Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y.: Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: 33rd International Conference on Neural Information Processing Systems, pp. 10197–10207. Curran Associates Inc., Red Hook (2019). https://dl.acm.org/doi/pdf/10.5555/3454287.3455202
Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 297–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_18
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Guo, Y., Hu, Q., Tang, Q., Traon, Y.L. (2024). An Empirical Study of the Imbalance Issue in Software Vulnerability Detection. In: Tsudik, G., Conti, M., Liang, K., Smaragdakis, G. (eds) Computer Security – ESORICS 2023. ESORICS 2023. Lecture Notes in Computer Science, vol 14347. Springer, Cham. https://doi.org/10.1007/978-3-031-51482-1_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-51482-1_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51481-4
Online ISBN: 978-3-031-51482-1
eBook Packages: Computer ScienceComputer Science (R0)