skip to main content
10.1145/3324884.3416621acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Multiple-boundary clustering and prioritization to promote neural network retraining

Published: 27 January 2021 Publication History

Abstract

With the increasing application of deep learning (DL) models in many safety-critical scenarios, effective and efficient DL testing techniques are much in demand to improve the quality of DL models. One of the major challenges is the data gap between the training data to construct the models and the testing data to evaluate them. To bridge the gap, testers aim to collect an effective subset of inputs from the testing contexts, with limited labeling effort, for retraining DL models.
To assist the subset selection, we propose Multiple-Boundary Clustering and Prioritization (MCP), a technique to cluster test samples into the boundary areas of multiple boundaries for DL models and specify the priority to select samples evenly from all boundary areas, to make sure enough useful samples for each boundary reconstruction. To evaluate MCP, we conduct an extensive empirical study with three popular DL models and 33 simulated testing contexts. The experiment results show that, compared with state-of-the-art baseline methods, on effectiveness, our approach MCP has a significantly better performance by evaluating the improved quality of retrained DL models; on efficiency, MCP also has the advantages in time costs.

References

[1]
2019. Discover the current state of the art in objects classification. https://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html. Accessed May 12, 2019.
[2]
2019. Softmax function. https://en.wikipedia.org/wiki/Softmax_function/. Accessed May 4, 2019.
[3]
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Y. Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yandong Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yogatama, Jun Zhan, and Zhenyao Zhu. 2016. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin. In ICML.
[4]
William H. Beluch, Tim Genewein, Andreas Nurnberger, and Jan M. Kohler. 2018. The Power of Ensembles for Active Learning in Image Classification. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
L. De Capitani and D. De Martini. 2011. On stochastic orderings of the Wilcoxon Rank Sum test statistic With applications to reproducibility probability estimation testing. Statistics and Probability Letters 81, 8 (2011), 937--946.
[6]
Chenyi Chen, Ari Seff, Alain Kornhauser, and Jianxiong Xiao. 2015. Deepdriving: Learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision. 2722--2730.
[7]
Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, and Matei Zaharia. [n.d.]. Selection via proxy: Efficient data selection for deep learning.
[8]
Ekin Dogus Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Policies from Data. https://arxiv.org/pdf/1805.09501.pdf
[9]
P. G. Frankl, R. G. Hamlet, B. Littlewood, and L. Strigini. 1998. Evaluating testing methods by delivered reliability [software]. Software Engineering IEEE Transactions on 24, 8 (1998), 586--601.
[10]
Xiang Gao, Ripon K Saha, Mukul R Prasad, and Abhik Roychoudhury. 2020. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).
[11]
Simos Gerasimou, Hasan Ferit Eniser, Alper Sen, and Alper Cakan. 2020. Importance-Driven Deep Learning System Testing. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
[13]
Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810--822.
[14]
Javad Hamidzadeh, Reza Monsefi, and Hadi Sadoghi Yazdi. 2015. IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering. Pattern Recogn. 48, 5 (May 2015), 1878--1889.
[15]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.
[16]
Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing Deep Neural Networks: Fix Patterns and Challenges. In 42nd International Conference on Software Engineering.
[17]
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding Deep Learning System Testing Using Surprise Adequacy. In Proceedings of the 41st International Conference on Software Engineering (Montreal, Quebec, Canada) (ICSE '19). IEEE Press, Piscataway, NJ, USA, 1039--1049.
[18]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436.
[19]
Y. LeCun and C. Cortes. 2019. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/. Accessed May 4, 2019.
[20]
Xin Li and Yuhong Guo. 2013. Adaptive active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 859--866.
[21]
Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian Lü. 2019. Boosting Operational DNN Testing Efficiency Through Conditioning. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Tallinn, Estonia) (ESEC/FSE 2019). ACM, New York, NY, USA, 499--509.
[22]
Ming Liang and Xiaolin Hu. 2015. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3367--3375.
[23]
Yibin Liu, Yanhui Li, Jianbo Guo, Yuming Zhou, and Baowen Xu. 2018. Connecting software metrics across versions to predict defects. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20--23, 2018, Rocco Oliveto, Massimiliano Di Penta, and David C. Shepherd (Eds.). IEEE Computer Society, 232--243.
[24]
L. Ma, F. Juefei-Xu, M. Xue, B. Li, L. Li, Y. Liu, and J. Zhao. 2019. DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 614--618.
[25]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018. 120--131.
[26]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei-Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.
[27]
L. Ma, F. Zhang, J. Sun, M. Xue, B. Li, F. Juefei-Xu, C. Xie, L. Li, Y. Liu, J. Zhao, and Y. Wang. 2018. DeepMutation: Mutation Testing of Deep Learning Systems. In 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). 100--111.
[28]
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: Automated Neural Network Model Debugging via State Differential Analysis and Input Selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). ACM, New York, NY, USA, 175--186.
[29]
Junhua Mao, Xu Wei, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L Yuille. 2015. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In Proceedings of the IEEE international conference on computer vision. 2533--2541.
[30]
J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan. 2018. Heterogeneous Defect Prediction. IEEE Transactions on Software Engineering 44, 9 (Sep. 2018), 874--896.
[31]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. (2011).
[32]
N.Krizhevsky, H.Vinod, C.Geoffrey, M.Papadakis, and A.Ventresque. [n.d.]. The cifar-10 dataset. http://www.cs.toronto.edu/~kriz/cifar.html. Accessed May 4, 2019.
[33]
Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, et al. 2016. Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 ( 2016).
[34]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28--31, 2017. 1--18.
[35]
Dong Ping and N. P Galatsanos. 2002. Affine transformation resistant water-marking based on image normalization. In International Conference on Image Processing.
[36]
J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, and L. Devine. 2006. Exploring methods for evaluating group differences on the NSSE and other surveys: Are the t-test and cohen's d indices the most appropriate choices. In In annual meeting of the Southern Association for Institutional Research.
[37]
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117.
[38]
Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations(ICLR 2018).
[39]
Burr Settles. 2009. Active learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.
[40]
Qingkai Shi, Jun Wan, Yang Feng, Chunrong Fang, and Zhenyu Chen. 2019. Deep-Gini: Prioritizing Massive Tests to Reduce Labeling Cost. CoRR abs/1903.00661 (2019). arXiv:1903.00661 http://arxiv.org/abs/1903.00661
[41]
Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic Testing and Improvement of Machine Translation. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).
[42]
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104--3112.
[43]
Bart Thomee, David A Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. YFCC100M: The new data in multimedia research. arXiv preprint arXiv:1503.01817 (2015).
[44]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-neural-network-driven Autonomous Cars. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE '18). ACM, NewYork, NY, USA, 303--314.
[45]
Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, and Baishakhi Ray. 2020. Testing DNN Image Classifier for Confusion & Bias Errors. In 42nd International Conference on Software Engineering.
[46]
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study. In Proceedings of the 22Nd ACM International Conference on Multimedia (Orlando, Florida, USA) (MM '14). ACM, New York, NY, USA, 157--166.
[47]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A Coverage-guided Fuzz Testing Framework for Deep Neural Networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (Beijing, China) (ISSTA 2019). ACM, New York, NY, USA, 146--157.
[48]
Yang You, Aydin Buluç, and James Demmel. 2017. Scaling Deep Learning on GPU and Knights Landing Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) (SC '17). ACM, New York, NY, USA, Article 9, 12 pages.
[49]
Hao Zhang and WK Chan. 2019. Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 376--387.
[50]
Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).
[51]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3--7, 2018. 132--142.
[52]
Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, and Mao Yang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In 42nd International Conference on Software Engineering.
[53]
Husheng Zhou, Wei Li, Yuankun Zhu, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. Deepbillboard: Systematic physical-world testing of autonomous driving systems. In Proceedings of the 42th International Conference on Software Engineering (ICSE '20).

Cited By

View all
  • (2025)Assessing the Robustness of Test Selection Methods for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/3715693Online publication date: 29-Jan-2025
  • (2025)DeepFeatureJournal of Systems and Software10.1016/j.jss.2024.112201219:COnline publication date: 1-Jan-2025
  • (2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
December 2020
1449 pages
ISBN:9781450367684
DOI:10.1145/3324884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. multiple-boundary
  3. neural network
  4. retraining
  5. software testing

Qualifiers

  • Research-article

Funding Sources

  • National Key R&D Program of China
  • National Natural Science Foundation of China

Conference

ASE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)71
  • Downloads (Last 6 weeks)8
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Assessing the Robustness of Test Selection Methods for Deep Neural NetworksACM Transactions on Software Engineering and Methodology10.1145/3715693Online publication date: 29-Jan-2025
  • (2025)DeepFeatureJournal of Systems and Software10.1016/j.jss.2024.112201219:COnline publication date: 1-Jan-2025
  • (2025)Why and How We Combine Multiple Deep Learning Models With Functional OverlapsJournal of Software: Evolution and Process10.1002/smr.7000337:2Online publication date: 16-Feb-2025
  • (2024)Prioritizing Test Inputs for DNNs Using Training DynamicsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695498(1219-1231)Online publication date: 27-Oct-2024
  • (2024)FAST: Boosting Uncertainty-based Test Prioritization Methods for Neural Networks via Feature SelectionProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695472(895-906)Online publication date: 27-Oct-2024
  • (2024)GIST: Generated Inputs Sets Transferability in Deep LearningACM Transactions on Software Engineering and Methodology10.1145/367245733:8(1-38)Online publication date: 3-Dec-2024
  • (2024)Test Selection for Deep Neural Networks using Meta-Models with Uncertainty MetricsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680312(678-690)Online publication date: 11-Sep-2024
  • (2024)Distance-Aware Test Input Selection for Deep Neural NetworksProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3652125(248-260)Online publication date: 11-Sep-2024
  • (2024)Test Optimization in DNN Testing: A SurveyACM Transactions on Software Engineering and Methodology10.1145/364367833:4(1-42)Online publication date: 20-Apr-2024
  • (2024)Knowledge Graph Driven Inference Testing for Question Answering SoftwareProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639109(1-13)Online publication date: 20-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media