research-article

Multiple-boundary clustering and prioritization to promote neural network retraining

Authors:

Baowen XuAuthors Info & Claims

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

Pages 410 - 422

https://doi.org/10.1145/3324884.3416621

Published: 27 January 2021 Publication History

Abstract

With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.

To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.

References

[1]

2019. Discover the current state of the art in objects classification. https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html. Accessed May 12, 2019.

[2]

2019. Softmax function. https://en.wikipedia.org/wiki/Softmax_function/. Accessed May 4, 2019.

[3]

Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Y. Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yandong Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2016. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. In ICML.

[4]

William H. Beluch, Tim Genewein, Andreas Nurnberger, and Jan M. Kohler. 2018. The Power of Ensembles for Active Learning in Image Classification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

L. De Capitani and D. De Martini. 2011. On stochastic orderings of the Wilcoxon Rank Sum test statistic With applications to reproducibility probability estimation testing. Statistics and Probability Letters 81, 8 (2011), 937--946.

[6]

Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722--2730.

Digital Library

[7]

Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. [n.d.]. Selection via proxy: Efficient data selection for deep learning.

[8]

Ekin Dogus Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Policies from Data. https://arxiv.org/pdf/1805.09501.pdf

[9]

P. G. Frankl, R. G. Hamlet, B. Littlewood, and L. Strigini. 1998. Evaluating testing methods by delivered reliability [software]. Software Engineering IEEE Transactions on 24, 8 (1998), 586--601.

Digital Library

[10]

Xiang Gao, Ripon K Saha, Mukul R Prasad, and Abhik Roychoudhury. 2020. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).

Digital Library

[11]

Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, and Alper Cakan. 2020. Importance-Driven Deep Learning System Testing. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

Digital Library

[13]

Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810--822.

Digital Library

[14]

Javad Hamidzadeh, Reza Monsefi, and Hadi Sadoghi Yazdi. 2015. IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering. Pattern Recogn. 48, 5 (May 2015), 1878--1889.

Digital Library

[15]

Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.

[16]

Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing Deep Neural Networks: Fix Patterns and Challenges. In 42nd International Conference on Software Engineering.

Digital Library

[17]

Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada) (ICSE '19). IEEE Press, Piscataway, NJ, USA, 1039--1049.

Digital Library

[18]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.

[19]

Y. LeCun and C. Cortes. 2019. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed May 4, 2019.

[20]

Xin Li and Yuhong Guo. 2013. Adaptive active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 859--866.

Digital Library

[21]

Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting Operational DNN Testing Efficiency Through Conditioning. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). ACM, New York, NY, USA, 499--509.

Digital Library

[22]

Ming Liang and Xiaolin Hu. 2015. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3367--3375.

[23]

Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20--23, 2018, Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.). IEEE Computer Society, 232--243.

[24]

L. Ma, F. Juefei-Xu, M. Xue, B. Li, L. Li, Y. Liu, and J. Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614--618.

[25]

Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018. 120--131.

Digital Library

[26]

Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.

[27]

L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). 100--111.

[28]

Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, New York, NY, USA, 175--186.

Digital Library

[29]

Junhua Mao, Xu Wei, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L Yuille. 2015. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In Proceedings of the IEEE international conference on computer vision. 2533--2541.

Digital Library

[30]

J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan. 2018. Heterogeneous Defect Prediction. IEEE Transactions on Software Engineering 44, 9 (Sep. 2018), 874--896.

[31]

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).

[32]

N.Krizhevsky, H.Vinod, C.Geoffrey, M.Papadakis, and A.Ventresque. [n.d.]. The cifar-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html. Accessed May 4, 2019.

[33]

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, et al. 2016. Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 ( 2016).

[34]

Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017. 1--18.

Digital Library

[35]

Dong Ping and N. P Galatsanos. 2002. Affine transformation resistant water-marking based on image normalization. In International Conference on Image Processing.

[36]

J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, and L. Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and cohen's d indices the most appropriate choices. In In annual meeting of the Southern Association for Institutional Research.

[37]

Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117.

[38]

Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations(ICLR 2018).

[39]

Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[40]

Qingkai Shi, Jun Wan, Yang Feng, Chunrong Fang, and Zhenyu Chen. 2019. Deep-Gini: Prioritizing Massive Tests to Reduce Labeling Cost. CoRR abs/1903.00661 (2019). arXiv:1903.00661 http://arxiv.org/abs/1903.00661

[41]

Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic Testing and Improvement of Machine Translation. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).

Digital Library

[42]

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.

[43]

Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. YFCC100M: The new data in multimedia research. arXiv preprint arXiv:1503.01817 (2015).

[44]

Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-neural-network-driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). ACM, NewYork, NY, USA, 303--314.

Digital Library

[45]

Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, and Baishakhi Ray. 2020. Testing DNN Image Classifier for Confusion & Bias Errors. In 42nd International Conference on Software Engineering.

[46]

Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In Proceedings of the 22Nd ACM International Conference on Multimedia (Orlando, Florida, USA) (MM '14). ACM, New York, NY, USA, 157--166.

Digital Library

[47]

Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). ACM, New York, NY, USA, 146--157.

Digital Library

[48]

Yang You, Aydin Buluç, and James Demmel. 2017. Scaling Deep Learning on GPU and Knights Landing Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '17). ACM, New York, NY, USA, Article 9, 12 pages.

Digital Library

[49]

Hao Zhang and WK Chan. 2019. Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 376--387.

Digital Library

[50]

Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).

Digital Library

[51]

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018. 132--142.

Digital Library

[52]

Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In 42nd International Conference on Software Engineering.

[53]

Husheng Zhou, Wei Li, Yuankun Zhu, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. Deepbillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).

Digital Library

Cited By

Hu QGuo YXie XCordy MMa WPapadakis MMa LLe Traon Y(2025)Assessing the Robustness of Test Selection Methods for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/3715693Online publication date: 29-Jan-2025
https://dl.acm.org/doi/10.1145/3715693
Feng LWang XZhang SZhao Z(2025)DeepFeatureJournal of Systems and Software10.1016/j.jss.2024.112201219:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.jss.2024.112201
Ma MLi YChen YChen LZhou Y(2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
https://doi.org/10.1002/smr.70003
Show More Cited By

Index Terms

Multiple-boundary clustering and prioritization to promote neural network retraining

Index terms have been assigned to the content through auto-classification.

Recommendations

Fault-based test suite prioritization for specification-based testing

Context: Existing test suite prioritization techniques usually rely on code coverage information or historical execution data that serve as indicators for estimating the fault-detecting ability of test cases. Such indicators are primarily empirical in ...
Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage
ICSM '01: Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)

Software testing is particularly expensive for developers of high-assurance software, such as software that is produced for commercial airborne systems. One reason for this expense is the Federal Aviation Administration's requirement that test suites be ...
Combined Source Code Approach for Test Case Prioritization
ICISS '18: Proceedings of the 1st International Conference on Information Science and Systems

Regression testing is an activity in the software testing process to ensure the software is validated and verified after modification occurred on software. It is costly process procedure which has been expected to reach half cost of the software ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering

December 2020

1449 pages

ISBN:9781450367684

DOI:10.1145/3324884

General Chair:
John Grundy,
Program Chairs:
Claire Le Goues,
David Lo

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China

Conference

ASE '20

Sponsor:

ASE '20: 35th IEEE/ACM International Conference on Automated Software Engineering

December 21 - 25, 2020

Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
374
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)8

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu QGuo YXie XCordy MMa WPapadakis MMa LLe Traon Y(2025)Assessing the Robustness of Test Selection Methods for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/3715693Online publication date: 29-Jan-2025
https://dl.acm.org/doi/10.1145/3715693
Feng LWang XZhang SZhao Z(2025)DeepFeatureJournal of Systems and Software10.1016/j.jss.2024.112201219:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.jss.2024.112201
Ma MLi YChen YChen LZhou Y(2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
https://doi.org/10.1002/smr.70003
Shen JLi ZPan MLi XFilkov VRay BZhou M(2024)Prioritizing Test Inputs for DNNs Using Training DynamicsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695498(1219-1231)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695498
Chen JWang JZhang XSun YKwiatkowska MChen JCheng PFilkov VRay BZhou M(2024)FAST: Boosting Uncertainty-based Test Prioritization Methods for Neural Networks via Feature SelectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695472(895-906)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695472
Tambon FKhomh FAntoniol G(2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/367245733:8(1-38)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3672457
Demir DBetin Can ASurer EChristakis MPradel M(2024)Test Selection for Deep Neural Networks using Meta-Models with Uncertainty MetricsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680312(678-690)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680312
Li ZXu ZJi RPan MZhang TWang LLi XChristakis MPradel M(2024)Distance-Aware Test Input Selection for Deep Neural NetworksProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652125(248-260)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3652125
Hu QGuo YXie XCordy MMa LPapadakis MLe Traon Y(2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 20-Apr-2024
https://dl.acm.org/doi/10.1145/3643678
Wang JLi YChen ZChen LZhang XZhou YRoychoudhury APaiva AAbreu RStorey M(2024)Knowledge Graph Driven Inference Testing for Question Answering SoftwareProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639109(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3639109
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten