research-article

Evaluating Surprise Adequacy for Deep Learning System Testing

Authors:

Shin YooAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 32, Issue 2

Article No.: 42, Pages 1 - 29

https://doi.org/10.1145/3546947

Published: 29 March 2023 Publication History

Abstract

The rapid adoption of Deep Learning (DL) systems in safety critical domains such as medical imaging and autonomous driving urgently calls for ways to test their correctness and robustness. Borrowing from the concept of test adequacy in traditional software testing, existing work on testing of DL systems initially investigated DL systems from structural point of view, leading to a number of coverage metrics. Our lack of understanding of the internal mechanism of Deep Neural Networks (DNNs), however, means that coverage metrics defined on the Boolean dichotomy of coverage are hard to intuitively interpret and understand. We propose the degree of out-of-distribution-ness of a given input as its adequacy for testing: the more surprising a given input is to the DNN under test, the more likely the system will show unexpected behavior for the input. We develop the concept of surprise into a test adequacy criterion, called Surprise Adequacy (SA). Intuitively, SA measures the difference in the behavior of the DNN for the given input and its behavior for the training data. We posit that a good test input should be sufficiently, but not overtly, surprising compared to the training dataset. This article evaluates SA using a range of DL systems from simple image classifiers to autonomous driving car platforms, as well as both small and large data benchmarks ranging from MNIST to ImageNet. The results show that the SA value of an input can be a reliable predictor of the correctness of the mode behavior. We also show that SA can be used to detect adversarial examples, and also be efficiently computed against large training dataset such as ImageNet using sampling.

References

[1]

Udacity. 2016. Autonomous driving model: Chauffeur. Retrieved from https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/chauffeur.

[2]

Udacity. 2016. The Udacity open source self-driving car project. Retrieved from https://github.com/udacity/self-driving-car.

[3]

Paul Ammann and Jeff Offutt. 2016. Introduction to Software Testing. Cambridge University Press.

[4]

Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, and Salah Rifai. 2013. Better mixing via deep representations. Proceedings of the 30th International Conference on Machine Learning 28, 1 (2013), 552–560.

[5]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, and Jake Zhao. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

[6]

Nicholas Carlini and David Wagner. 2017. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 3–14.

Digital Library

[7]

Nicholas Carlini and David A. Wagner. 2016. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (S&P). 39–57.

[8]

Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722–2730.

Digital Library

[9]

Junjie Chen, Ming Yan, Zan Wang, Yuning Kang, and Zhuo Wu. 2020. Deep neural network test coverage: How far are we? arXiv:2010.04946. Retrieved from https://arxiv.org/abs/2010.04946.

[10]

T. Y. Chen, F.-C. Kuo, T. H. Tse, and Zhi Quan Zhou. 2004. Metamorphic testing and beyond. In Proceedings of the International Workshop on Software Technology and Engineering Practice (STEP’03). 94–100.

[11]

Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. 2019. Selection via proxy: Efficient data selection for deep learning. In International Conference on Learning Representations (ICLR).

[12]

Zhihua Cui, Fei Xue, Xingjuan Cai, Yang Cao, Gai-ge Wang, and Jinjun Chen. 2018. Detection of malicious code variants based on deep learning. IEEE Transactions on Industrial Informatics 14, 7 (2018), 3187–3196.

[13]

J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255.

[14]

Clement Farabet, Camille Couprie, Laurent Najman, and Yann LeCun. 2013. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1915–1929.

Digital Library

[15]

Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, and Andrew B. Gardner. 2017. Detecting adversarial samples from artifacts. arXiv:1703.00410. Retrieved from https://arxiv.org/abs/1703.00410.

[16]

Robert Feldt, Simon Poulding, David Clark, and Shin Yoo. 2016. Test set diameter: Quantifying the diversity of sets of test cases. In Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation (ICST’16). 223–233.

[17]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning. 1050–1059.

Digital Library

[18]

Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep bayesian active learning with image data. In Proceedings of the International Conference on Machine Learning. PMLR, 1183–1192.

[19]

Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, et al. 2021. A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342.

[20]

Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, and Alper Cakan. 2020. Importance-driven deep learning system testing. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 702–713.

[21]

Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations.

[22]

Antonio Guerriero, Roberto Pietrantuono, and Stefano Russo. 2021. Operation is the hardest teacher: Estimating DNN accuracy looking for mispredictions. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 348–358.

Digital Library

[23]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.

[24]

Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-Rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82–97.

[25]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.

Digital Library

[26]

Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In Computer Aided Verification, Rupak Majumdar and Viktor Kunčak (Eds.), Springer International Publishing, 3–29.

[27]

Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1–10.

[28]

Sungmin Kang, Robert Feldt, and Shin Yoo. 2020. SINVAD: Search-based image space navigation for DNN image classifier test input generation. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops (SBST’20). 521–528.

Digital Library

[29]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41th International Conference on Software Engineering (ICSE’19). IEEE Press, 1039–1049.

Digital Library

[30]

Jinhan Kim, Jeongil Ju, Robert Feldt, and Shin Yoo. 2020. Reducing DNN labelling cost using surprise adequacy: An industrial case study for autonomous driving. In Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE Industry Track) (ESEC/FSE’20). 1466–1476.

Digital Library

[31]

Seah Kim and Shin Yoo. 2021. Multimodal surprise adequacy analysis of inputs for natural language processing DNN models. In Proceedings of the 2021 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 80–89.

[32]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.

[33]

Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The CIFAR-10 dataset. Retrieved from http://www.cs.toronto.edu/kriz/cifar.html. (2014).

[34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097–1105.

Digital Library

[35]

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. 2016. Adversarial examples in the physical world. Artificial Intelligence Safety and Security. Chapman and Hall/CRC, 99–112.

[36]

Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2016. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems 30.

[37]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.

[38]

Yann LeCun, Corinna Cortes, and CJ Burges. 2010. MNIST handwritten digit database. AT & T Labs [Online]. Retrieved from http://yann.lecun.com/exdb/mnist. 2 (2010).

[39]

Yu Li, Min Li, Qiuxia Lai, Yannan Liu, and Qiang Xu. 2021. TestRank: Bringing order into unlabeled test instances for deep learning tasks. Advances in Neural Information Processing Systems 34 (2021), 20874–20886.

[40]

Stijn Luca, Peter Karsmakers, Kris Cuppens, Tom Croonenborghs, Anouk Van de Vel, Berten Ceulemans, Lieven Lagae, Sabine Van Huffel, and Bart Vanrumste. 2014. Detecting rare events using extreme value statistics applied to epileptic convulsions in children. Artificial Intelligence in Medicine 60, 2 (2014), 89–96.

Digital Library

[41]

Lei Ma, Felix Juefei-Xu, Jiyuan Sun, Chunyang Chen, Ting Su, Fuyuan Zhang, Minhui Xue, Bo Li, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Comprehensive and multi-granularity testing criteria for gauging the robustness of deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120–131.

[42]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. DeepMutation: Mutation testing of deep learning systems. In IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). 100–111.

[43]

Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial testing for deep learning systems. arXiv preprint arXiv:1806.07723.

[44]

Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Michael E. Houle, Grant Schoenebeck, Dawn Song, and James Bailey. 2018. Characterizing adversarial subspaces using local intrinsic dimensionality. In Proceedings of the 6th International Conference on Learning Representations. ICLR, 1–15.

[45]

Prasanta Chandra Mahalanobis. 2018. Reprint of: Mahalanobis, P.C. (1936) “On the generalised distance in statistics”. Sankhya A 80, 1 (2018), 1–7.

[46]

Christian Murphy, Kuang Shen, and Gail Kaiser. 2009. Automatic system testing of programs without test oracles. In Proceedings of the 18th International Symposium on Software Testing and Analysis (ISSTA’09). ACM Press, 189–200.

Digital Library

[47]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.

[48]

Tinghui Ouyang, Yoshinao Isobe, Vicent Sanz Marco, Jun Ogata, Yoshiki Seo, and Yutaka Oiwa. 2021. AI robustness analysis with consideration of corner cases. In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence Testing (AITest). IEEE, 29–36.

[49]

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-Lin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long. 2018. Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768.

[50]

Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2015. The limitations of deep learning in adversarial settings. In IEEE European Symposium on Security and Privacy (EuroS&P’16). IEEE, 372–387.

[51]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). ACM, New York, NY, 1–18. DOI:

Digital Library

[52]

Simon Poulding and Robert Feldt. 2017. Generating controllably invalid and atypical inputs for robustness testing. In Proceedings of the Software Testing, Verification and Validation Workshops (ICSTW’17), IEEE International Conference on. IEEE, 81–84.

[53]

Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Brij B. Gupta, Xiaojiang Chen, and Xin Wang. 2021. A survey of deep active learning. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–40.

Digital Library

[54]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.

[55]

Wojciech Samek, Grégoire Montavon, Sebastian Lapuschkin, Christopher J. Anders, and Klaus-Robert Müller. 2021. Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE 109, 3 (2021), 247–278. DOI:

[56]

David W. Scott. 2015. Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.

[57]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 3104–3112.

Digital Library

[58]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.

[59]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.

[60]

L. Tarassenko, A. Hann, A. Patterson, E. Braithwaite, K. Davidson, V. Barber, and D. Young. 2005. Biosign™: Multi-parameter monitoring for early warning of patient deterioration. In The 3rd IEE International Seminar on Medical Applications of Signal Processing 2005 (Ref. No. 2005-1119). 71–76.

[61]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303–314.

Digital Library

[62]

Matt P. Wand and M. Chris Jones. 1994. Kernel Smoothing. Chapman and Hall/CRC.

[63]

Zan Wang, Hanmo You, Junjie Chen, Yingyi Zhang, Xuyuan Dong, and Wenbin Zhang. 2021. Prioritizing test inputs for deep neural networks via mutation analysis. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 397–409.

Digital Library

[64]

M. Weiss, R. Chakraborty, and P. Tonella. 2021. A review and refinement of surprise adequacy. In Proceedings of the 2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest). IEEE Computer Society, 17–24. DOI:

[65]

Wikipedia. 2021. Computational complexity of mathematical operations. Retrieved from https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations. Online; accessed 22-October-2021.

[66]

Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157.

Digital Library

[67]

Shenao Yan, Guanhong Tao, Xuwei Liu, Juan Zhai, Shiqing Ma, Lei Xu, and Xiangyu Zhang. 2020. Correlations between deep neural network model coverage criteria and model quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 775–787.

Digital Library

[68]

Shin Yoo. 2010. Metamorphic testing of stochastic optimisation. In Proceedings of the 3rd International Workshop on Search-Based Software Testing (SBST’10). 192–201.

Digital Library

[69]

Shin Yoo. 2019. SBST in the age of machine learning systems: Challenges ahead. In Proceedings of the 12th International Workshop on Search-Based Software Testing (SBST’19). IEEE Press, 2–2. DOI:

Digital Library

[70]

Shin Yoo and Mark Harman. 2012. Regression testing minimisation, selection and prioritisation: A survey. Software Testing, Verification, and Reliability 22, 2 (March2012), 67–120.

[71]

Hong Zhu, Patrick A. V. Hall, and John H. R. May. 1997. Software unit test coverage and adequacy. ACM Computing Surverys 29, 4 (Dec.1997), 366–427. DOI:

Digital Library

Cited By

Shi YYin BShi J(2025)Markov model based coverage testing of deep learning software systemsInformation and Software Technology10.1016/j.infsof.2024.107628179(107628)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107628
Amini MNejati SFilkov VRay BZhou M(2024)Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving SystemsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695067(732-744)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695067
Huang LSun WYan MLiu ZLei YLo D(2024)Neuron Semantic-Guided Test Generation for Deep Neural Networks FuzzingACM Transactions on Software Engineering and Methodology10.1145/368883534:1(1-38)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1145/3688835
Show More Cited By

Index Terms

Evaluating Surprise Adequacy for Deep Learning System Testing
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management

Recommendations

Guiding deep learning system testing using surprise adequacy
ICSE '19: Proceedings of the 41st International Conference on Software Engineering

Deep Learning (DL) systems are rapidly being adopted in safety and security critical domains, urgently calling for ways to test their correctness and robustness. Testing of DL systems has traditionally relied on manual collection and labelling of data. ...
An intuitive approach to determine test adequacy in safety-critical software

Safety-critical software must adhere to stringent quality standards and is expected to be thoroughly tested. However, exhaustive testing of software is usually impractical. The two main challenges faced by a software testing team are generation of ...
Importance-driven deep learning system testing
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Deep Learning (DL) systems are key enablers for engineering intelligent applications due to their ability to solve complex tasks such as image recognition and machine translation. Nevertheless, using DL systems in safety- and security-critical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 32, Issue 2

March 2023

946 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/3586025

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2023

Online AM: 06 July 2022

Accepted: 11 June 2022

Revised: 17 March 2022

Received: 22 November 2021

Published in TOSEM Volume 32, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation of Korea (NRF)
Institute of Information & communications Technology Planning & Evaluation
Swedish Scientific Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
913
Total Downloads

Downloads (Last 12 months)333
Downloads (Last 6 weeks)19

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shi YYin BShi J(2025)Markov model based coverage testing of deep learning software systemsInformation and Software Technology10.1016/j.infsof.2024.107628179(107628)Online publication date: Mar-2025
https://doi.org/10.1016/j.infsof.2024.107628
Amini MNejati SFilkov VRay BZhou M(2024)Bridging the Gap between Real-world and Synthetic Images for Testing Autonomous Driving SystemsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695067(732-744)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695067
Huang LSun WYan MLiu ZLei YLo D(2024)Neuron Semantic-Guided Test Generation for Deep Neural Networks FuzzingACM Transactions on Software Engineering and Methodology10.1145/368883534:1(1-38)Online publication date: 14-Aug-2024
https://dl.acm.org/doi/10.1145/3688835
Huang DBu QFu YQing YXie XChen JCui H(2024)Neuron Sensitivity-Guided Test Case SelectionACM Transactions on Software Engineering and Methodology10.1145/367245433:7(1-32)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672454
Demir DBetin Can ASurer EChristakis MPradel M(2024)Test Selection for Deep Neural Networks using Meta-Models with Uncertainty MetricsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680312(678-690)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680312
Hu QGuo YXie XCordy MMa LPapadakis MLe Traon Y(2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3643678
Abbasishahkoo ADadkhah MBriand LLin D(2024)TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural NetworksIEEE Transactions on Software Engineering10.1109/TSE.2024.3482984(1-23)Online publication date: 2024
https://doi.org/10.1109/TSE.2024.3482984
Sahu ACârlan C(2024)Defect-based Testing for Safety-critical ML Components2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00088(255-262)Online publication date: 28-Oct-2024
https://doi.org/10.1109/ISSREW63542.2024.00088
Feng LWang XZhang SZhao Z(2024)DeepFeature: Guiding adversarial testing for deep neural network systems using robust featuresJournal of Systems and Software10.1016/j.jss.2024.112201(112201)Online publication date: Aug-2024
https://doi.org/10.1016/j.jss.2024.112201
Guo HTao CHuang Z(2024)Neuron importance-aware coverage analysis for deep neural network testingEmpirical Software Engineering10.1007/s10664-024-10524-x29:5Online publication date: 25-Jul-2024
https://dl.acm.org/doi/10.1007/s10664-024-10524-x
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents