Skip to main content

Advertisement

Log in

BallPri: test cases prioritization for deep neuron networks via tolerant ball in variable space

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) have gained widespread adoption in various applications, including some safety-critical domains such as autonomous driving. However, despite their impressive capabilities and outstanding performance, DNNs could also exhibit incorrect behaviors that may lead to serious accidents. As a result, it requires security assurance urgently when applied to safety-critical applications. Deep testing has been developed as an effective technique for detecting incorrectness in DNN behaviors and improving their robustness when necessary, but it needs a large amount of labeled test cases that are expensive to obtain due to the labor-intensive data labeling process. Test case prioritization has been proposed to identify more error-exposed test cases earlier in advance, and several techniques such as DeepGini and PRIMA have been developed that achieve effective and efficient prioritization for classification tasks. However, these methods still face challenges such as unreliable validity, limited application scenarios, and high time complexity. To tackle these issues, we present a novel test prioritization method BallPri by using tolerant ball in variable space for DNNs. It extracts tolerant ball of different test cases and use minimum non-parametric likelihood ratio (MinLR) to further enlarge the difference of distribution in variable space, to achieve effective and general test cases prioritizing. Extensive experiments on benchmark datasets and models validate that BallPri outperforms the state-of-the-art methods in three key aspects: (1) Effective—it leverages tolerant ball in variable space to identify malicious bug-revealing inputs. BallPri significantly improves 47.83% prioritization effectiveness and 37.27% prioritization efficiency on average compared with baselines. (2) Extensible—it can be applied to various tasks, data and models. We verify the superiority of BallPri on classification and regression task, convolutional neural network and recurrent neural network model, image, text and speech dataset. (3) Efficient—it achieves a low time complexity compared with existing methods. We further evaluate BallPri against potential adaptive attacks and provide guidance for its accuracy and robustness. The open-source code of BallPri could be downloaded at https://github.com/lixiaohaao/BallPri.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

No datasets were generated or analysed during the current study

Notes

  1. http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz.

  2. http://benchmark.ini.rub.de/?section=gtsrb&subsection=dataset.

  3. https://image-net.org/.

  4. https://datashare.ed.ac.uk/download/DS_10283_3443.zip.

  5. https://rpg.ifi.uzh.ch/dronet.html.

References

  • Al-Qadasi, H., Wu, C., Falcone, Y., Bensalem, S.: Deepabstraction: 2-level prioritization for unlabeled test inputs in deep neural networks. In: 2022 IEEE International Conference On Artificial Intelligence Testing (AITest), pp. 64–71. IEEE (2022)

  • Benz, P., Zhang, C., Imtiaz, T., Kweon, I.S.: Double targeted universal adversarial perturbations. In: Ishikawa, H., Liu, C., Pajdla, T., Shi, J. (eds.) Computer Vision - ACCV 2020—15th Asian Conference on Computer Vision, Kyoto, Japan, November 30–December 4, 2020, Revised Selected Papers, Part IV. Lecture Notes in Computer Science, vol. 12625, pp. 284–300. Springer, Kyoto, Japan (2020). https://doi.org/10.1007/978-3-030-69538-5_18

  • Byun, T., Sharma, V., Vijayakumar, A., Rayadurgam, S., Cofer, D.: Input prioritization for testing neural networks. In: 2019 IEEE International Conference On Artificial Intelligence Testing (AITest), pp. 63–70. IEEE (2019)

  • Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). IEEE (2017)

  • Chen, J., Ge, J., Zheng, H.: Actgraph: prioritization of test cases based on deep neural network activation graph. Autom. Softw. Eng. 30(2), 28 (2023)

    Article  MATH  Google Scholar 

  • Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9185–9193 (2018)

  • Duarte, D., Nex, F., Kerle, N., Vosselman, G.: Satellite image classification of building damages using airborne and satellite image samples in a deep learning approach. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 4(2), 1–9 (2018)

    Google Scholar 

  • Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 177–188 (2020)

  • Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, pp. 7–9 (2015). arxiv: 1412.6572

  • Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: Dlfuzz: differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 739–743 (2018)

  • Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 851–862 (2020)

  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  • Hong, J.: Why is artificial intelligence blamed more? Analysis of faulting artificial intelligence for self-driving car accidents in experimental settings. Int. J. Hum. Comput. Interact. 36(18), 1768–1774 (2020). https://doi.org/10.1080/10447318.2020.1785693

    Article  MATH  Google Scholar 

  • Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 1039–1049. IEEE (2019)

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  MATH  Google Scholar 

  • Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Workshop Track Proceedings. OpenReview.net, Toulon, France (2017). https://openreview.net/forum?id=HJGU3Rodl

  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  MATH  Google Scholar 

  • Lee, S., Cha, S., Lee, D., Oh, H.: Effective white-box testing of deep neural networks with adaptive neuron-selection strategy. In: Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 165–176 (2020)

  • Li, C., Ma, X., Jiang, B., Li, X., Zhang, X., Liu, X., Cao, Y., Kannan, A., Zhu, Z.: Deep speaker: an end-to-end neural speaker embedding system, pp. 1–8 (2017). CoRR arxiv: 1705.02304

  • Li, Y., Li, M., Lai, Q., Liu, Y., Xu, Q.: Testrank: Bringing order into unlabeled test instances for deep learning tasks. Adv. Neural Inf. Process. Syst. 34, 20874–20886 (2021)

    Google Scholar 

  • Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., et al.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 120–131 (2018)

  • Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., Traon, Y.L.: Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30(2), 13–11322 (2021). https://doi.org/10.1145/3417330

    Article  MATH  Google Scholar 

  • Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations (2018)

  • Moosavi-Dezfooli, S.-M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

  • Mukhamadiyev, A., Khujayarov, I., Djuraev, O., Cho, J.: Automatic speech recognition method based on deep learning approaches for Uzbek language. Sensors 22(10), 3683 (2022). https://doi.org/10.3390/s22103683

    Article  Google Scholar 

  • Nasery, A., Thakur, S., Piratla, V., De, A., Sarawagi, S.: Training for the future: a simple gradient interpolation loss to generalize along time. In: Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, Virtual, pp. 19198–19209 (2021). https://proceedings.neurips.cc/paper/2021/hash/a02ef8389f6d40f84b50504613117f88-Abstract.html

  • Odena, A., Olsson, C., Andersen, D., Goodfellow, I.: Tensorfuzz: debugging neural networks with coverage-guided fuzzing. In: International Conference on Machine Learning, pp. 4901–4911. PMLR (2019)

  • Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 372–387. IEEE (2016)

  • Pavlitskaya, S., Yıkmış, Ş., Zöllner, J.M.: Is neuron coverage needed to make person detection more robust? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2889–2897 (2022)

  • Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. Commun. ACM 62(11), 137–145 (2019). https://doi.org/10.1145/3361566

    Article  Google Scholar 

  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). arxiv: 1409.1556

  • Teng, Q., Liu, Z., Song, Y., Han, K., Lu, Y.: A survey on the interpretability of deep learning in medical diagnosis. Multimed. Syst. 28(6), 2335–2355 (2022). https://doi.org/10.1007/s00530-022-00960-4

    Article  MATH  Google Scholar 

  • Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, pp. 303–314 (2018)

  • Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 1–27 (2008)

    MATH  Google Scholar 

  • Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., Zhang, W.: Prioritizing test inputs for deep neural networks via mutation analysis. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 397–409. IEEE (2021)

  • Wei, Z., Wang, H., Ashraf, I., Chan, W.: Predictive mutation analysis of test case prioritization for deep neural networks. In: 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS), pp. 682–693. IEEE (2022)

  • Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 139–150 (2022)

  • Wen, L., Jo, K.: Deep learning-based perception systems for autonomous driving: a comprehensive survey. Neurocomputing 489, 255–270 (2022). https://doi.org/10.1016/j.neucom.2021.08.155

    Article  MATH  Google Scholar 

  • Wicker, M., Huang, X., Kwiatkowska, M.: Feature-guided black-box safety testing of deep neural networks. In: Tools and Algorithms for the Construction and Analysis of Systems: 24th International Conference, TACAS 2018, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14–20, 2018, Proceedings, Part I 24, pp. 408–426. Springer (2018)

  • Xie, X., Ma, L., Juefei-Xu, F., Xue, M., Chen, H., Liu, Y., Zhao, J., Li, B., Yin, J., See, S.: Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 146–157 (2019)

  • Yan, R., Chen, Y., Gao, H., Yan, J.: Test case prioritization with neuron valuation based pattern. Sci. Comput. Program. 215, 102761 (2022)

    Article  MATH  Google Scholar 

  • Yang, X., Liu, W., Zhang, S., Liu, W., Tao, D.: Targeted attention attack on deep learning models in road sign recognition. IEEE Internet Things J. 8(6), 4980–4990 (2021). https://doi.org/10.1109/JIOT.2020.3034899

    Article  MATH  Google Scholar 

  • You, H., Wang, Z., Chen, J., Liu, S., Li, S.: Regression fuzzing for deep learning systems. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pp. 82–94. IEEE (2023)

  • Zhang, F., Hu, X., Ma, L., Zhao, J.: Deeprover: A query-efficient blackbox attack for deep neural networks. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1384–1394 (2023)

  • Zheng, H., Chen, J., Jin, H.: Certpri: certifiable prioritization for deep neural networks via movement cost in feature space. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1–13. IEEE (2023)

Download references

Acknowledgements

This research was supported by the Zhejiang Provincial Natural Science Foundation (No. LDQ23F020001), the National Natural Science Foundation of China (Nos. 62072406, 62406286, 62103374), Key Research and Development Program of Zhejiang Province (No. 2022C01018).

Author information

Authors and Affiliations

Authors

Contributions

Chengyu Jia: Conceptualization, Data curation, Funding acquisition, Methodology, Resources, Writing-review & editing, Supervision, Data curation, Formal analysis. Jinyin Chen: Conceptualization, Data curation, Methodology, Writing-original draft, Visualization, Supervision. Xiaohao Li : Conceptualization, Data curation, Funding acquisition, Methodology, Resources, Writing-review & editing, Supervision, Data curation, Formal analysis. Haibin Zheng: Methodology, Writing-original draft, Validation. Luxin Zhang: Methodology, Validation.

Corresponding author

Correspondence to Haibin Zheng.

Ethics declarations

Conflict of interest

The authors declare no competing interests

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, C., Chen, J., Li, X. et al. BallPri: test cases prioritization for deep neuron networks via tolerant ball in variable space. Autom Softw Eng 32, 29 (2025). https://doi.org/10.1007/s10515-025-00498-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-025-00498-5

Keywords