Abstract
Deep neural networks (DNNs) have gained widespread adoption in various applications, including some safety-critical domains such as autonomous driving. However, despite their impressive capabilities and outstanding performance, DNNs could also exhibit incorrect behaviors that may lead to serious accidents. As a result, it requires security assurance urgently when applied to safety-critical applications. Deep testing has been developed as an effective technique for detecting incorrectness in DNN behaviors and improving their robustness when necessary, but it needs a large amount of labeled test cases that are expensive to obtain due to the labor-intensive data labeling process. Test case prioritization has been proposed to identify more error-exposed test cases earlier in advance, and several techniques such as DeepGini and PRIMA have been developed that achieve effective and efficient prioritization for classification tasks. However, these methods still face challenges such as unreliable validity, limited application scenarios, and high time complexity. To tackle these issues, we present a novel test prioritization method BallPri by using tolerant ball in variable space for DNNs. It extracts tolerant ball of different test cases and use minimum non-parametric likelihood ratio (MinLR) to further enlarge the difference of distribution in variable space, to achieve effective and general test cases prioritizing. Extensive experiments on benchmark datasets and models validate that BallPri outperforms the state-of-the-art methods in three key aspects: (1) Effective—it leverages tolerant ball in variable space to identify malicious bug-revealing inputs. BallPri significantly improves 47.83% prioritization effectiveness and 37.27% prioritization efficiency on average compared with baselines. (2) Extensible—it can be applied to various tasks, data and models. We verify the superiority of BallPri on classification and regression task, convolutional neural network and recurrent neural network model, image, text and speech dataset. (3) Efficient—it achieves a low time complexity compared with existing methods. We further evaluate BallPri against potential adaptive attacks and provide guidance for its accuracy and robustness. The open-source code of BallPri could be downloaded at https://github.com/lixiaohaao/BallPri.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Data availability
No datasets were generated or analysed during the current study
References
Al-Qadasi, H., Wu, C., Falcone, Y., Bensalem, S.: Deepabstraction: 2-level prioritization for unlabeled test inputs in deep neural networks. In: 2022 IEEE International Conference On Artificial Intelligence Testing (AITest), pp. 64–71. IEEE (2022)
Benz, P., Zhang, C., Imtiaz, T., Kweon, I.S.: Double targeted universal adversarial perturbations. In: Ishikawa, H., Liu, C., Pajdla, T., Shi, J. (eds.) Computer Vision - ACCV 2020—15th Asian Conference on Computer Vision, Kyoto, Japan, November 30–December 4, 2020, Revised Selected Papers, Part IV. Lecture Notes in Computer Science, vol. 12625, pp. 284–300. Springer, Kyoto, Japan (2020). https://doi.org/10.1007/978-3-030-69538-5_18
Byun, T., Sharma, V., Vijayakumar, A., Rayadurgam, S., Cofer, D.: Input prioritization for testing neural networks. In: 2019 IEEE International Conference On Artificial Intelligence Testing (AITest), pp. 63–70. IEEE (2019)
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE (2017)
Chen, J., Ge, J., Zheng, H.: Actgraph: prioritization of test cases based on deep neural network activation graph. Autom. Softw. Eng. 30(2), 28 (2023)
Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)
Duarte, D., Nex, F., Kerle, N., Vosselman, G.: Satellite image classification of building damages using airborne and satellite image samples in a deep learning approach. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 4(2), 1–9 (2018)
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 177–188 (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, pp. 7–9 (2015). arxiv: 1412.6572
Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: Dlfuzz: differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 739–743 (2018)
Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 851–862 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hong, J.: Why is artificial intelligence blamed more? Analysis of faulting artificial intelligence for self-driving car accidents in experimental settings. Int. J. Hum. Comput. Interact. 36(18), 1768–1774 (2020). https://doi.org/10.1080/10447318.2020.1785693
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1039–1049. IEEE (2019)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=HJGU3Rodl
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Lee, S., Cha, S., Lee, D., Oh, H.: Effective white-box testing of deep neural networks with adaptive neuron-selection strategy. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 165–176 (2020)
Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., Cao, Y., Kannan, A., Zhu, Z.: Deep speaker: an end-to-end neural speaker embedding system, pp. 1–8 (2017). CoRR arxiv: 1705.02304
Li, Y., Li, M., Lai, Q., Liu, Y., Xu, Q.: Testrank: Bringing order into unlabeled test instances for deep learning tasks. Adv. Neural Inf. Process. Syst. 34, 20874–20886 (2021)
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., et al.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 120–131 (2018)
Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., Traon, Y.L.: Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30(2), 13–11322 (2021). https://doi.org/10.1145/3417330
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018)
Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Mukhamadiyev, A., Khujayarov, I., Djuraev, O., Cho, J.: Automatic speech recognition method based on deep learning approaches for Uzbek language. Sensors 22(10), 3683 (2022). https://doi.org/10.3390/s22103683
Nasery, A., Thakur, S., Piratla, V., De, A., Sarawagi, S.: Training for the future: a simple gradient interpolation loss to generalize along time. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 19198–19209 (2021). https://proceedings.neurips.cc/paper/2021/hash/a02ef8389f6d40f84b50504613117f88-Abstract.html
Odena, A., Olsson, C., Andersen, D., Goodfellow, I.: Tensorfuzz: debugging neural networks with coverage-guided fuzzing. In: International Conference on Machine Learning, pp. 4901–4911. PMLR (2019)
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 372–387. IEEE (2016)
Pavlitskaya, S., Yıkmış, Ş., Zöllner, J.M.: Is neuron coverage needed to make person detection more robust? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2889–2897 (2022)
Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. Commun. ACM 62(11), 137–145 (2019). https://doi.org/10.1145/3361566
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). arxiv: 1409.1556
Teng, Q., Liu, Z., Song, Y., Han, K., Lu, Y.: A survey on the interpretability of deep learning in medical diagnosis. Multimed. Syst. 28(6), 2335–2355 (2022). https://doi.org/10.1007/s00530-022-00960-4
Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, pp. 303–314 (2018)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 1–27 (2008)
Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., Zhang, W.: Prioritizing test inputs for deep neural networks via mutation analysis. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 397–409. IEEE (2021)
Wei, Z., Wang, H., Ashraf, I., Chan, W.: Predictive mutation analysis of test case prioritization for deep neural networks. In: 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), pp. 682–693. IEEE (2022)
Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 139–150 (2022)
Wen, L., Jo, K.: Deep learning-based perception systems for autonomous driving: a comprehensive survey. Neurocomputing 489, 255–270 (2022). https://doi.org/10.1016/j.neucom.2021.08.155
Wicker, M., Huang, X., Kwiatkowska, M.: Feature-guided black-box safety testing of deep neural networks. In: Tools and Algorithms for the Construction and Analysis of Systems: 24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14–20, 2018, Proceedings, Part I 24, pp. 408–426. Springer (2018)
Xie, X., Ma, L., Juefei-Xu, F., Xue, M., Chen, H., Liu, Y., Zhao, J., Li, B., Yin, J., See, S.: Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 146–157 (2019)
Yan, R., Chen, Y., Gao, H., Yan, J.: Test case prioritization with neuron valuation based pattern. Sci. Comput. Program. 215, 102761 (2022)
Yang, X., Liu, W., Zhang, S., Liu, W., Tao, D.: Targeted attention attack on deep learning models in road sign recognition. IEEE Internet Things J. 8(6), 4980–4990 (2021). https://doi.org/10.1109/JIOT.2020.3034899
You, H., Wang, Z., Chen, J., Liu, S., Li, S.: Regression fuzzing for deep learning systems. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 82–94. IEEE (2023)
Zhang, F., Hu, X., Ma, L., Zhao, J.: Deeprover: A query-efficient blackbox attack for deep neural networks. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1384–1394 (2023)
Zheng, H., Chen, J., Jin, H.: Certpri: certifiable prioritization for deep neural networks via movement cost in feature space. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1–13. IEEE (2023)
Acknowledgements
This research was supported by the Zhejiang Provincial Natural Science Foundation (No. LDQ23F020001), the National Natural Science Foundation of China (Nos. 62072406, 62406286, 62103374), Key Research and Development Program of Zhejiang Province (No. 2022C01018).
Author information
Authors and Affiliations
Contributions
Chengyu Jia: Conceptualization, Data curation, Funding acquisition, Methodology, Resources, Writing-review & editing, Supervision, Data curation, Formal analysis. Jinyin Chen: Conceptualization, Data curation, Methodology, Writing-original draft, Visualization, Supervision. Xiaohao Li : Conceptualization, Data curation, Funding acquisition, Methodology, Resources, Writing-review & editing, Supervision, Data curation, Formal analysis. Haibin Zheng: Methodology, Writing-original draft, Validation. Luxin Zhang: Methodology, Validation.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jia, C., Chen, J., Li, X. et al. BallPri: test cases prioritization for deep neuron networks via tolerant ball in variable space. Autom Softw Eng 32, 29 (2025). https://doi.org/10.1007/s10515-025-00498-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-025-00498-5