research-article

Fuzz testing based data augmentation to improve robustness of deep neural networks

Authors:
Xiang Gao

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Ripon K. Saha

Fujitsu Laboratories of America

Fujitsu Laboratories of America
View Profile

,
Mukul R. Prasad

Fujitsu Laboratories of America

Fujitsu Laboratories of America
View Profile

,
Abhik Roychoudhury

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringJune 2020Pages 1147–1158https://doi.org/10.1145/3377811.3380415

Published:01 October 2020Publication History

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Pages 1147–1158

ABSTRACT

Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.

References

Rajeev Alur, Rishabh Singh, Dana Fisman, and Armando Solar-Lezama. 2018. Search-based Program Synthesis. Commun. ACM 61 (2018).Google Scholar
Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. 2009. Robust optimization. Vol. 28. Princeton University Press.Google Scholar
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. ArXiv preprint arXiv:1604.07316 (2016).Google Scholar
Nicholas Carlini and David Wagner. 2017. Magnet and "efficient defenses against adversarial attacks" are not robust to adversarial examples. ArXiv preprint arXiv:1711.08478 (2017).Google Scholar
Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google Scholar
Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
Pedro M Domingos. 2012. A few useful things to know about machine learning. Communication of the ACM 55, 10 (2012), 78--87.Google ScholarDigital Library
Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. 2019. Exploring the Landscape of Spatial Robustness. In International Conference on Machine Learning (ICML). 1802--1811.Google Scholar
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 420--435.Google ScholarDigital Library
Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding Program Repair. In ACM SIGSOFT International Symposium on Testing and Analysis (ISSTA).Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS). 2672--2680.Google Scholar
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarCross Ref
Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. ACM, 43--58.Google ScholarDigital Library
Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE). IEEE Press, 1039--1049.Google ScholarDigital Library
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS). 1097--1105.Google Scholar
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR).Google Scholar
Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM 62, 12 (2019).Google Scholar
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). ACM, 120--131.Google ScholarDigital Library
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 175--186.Google ScholarDigital Library
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.Google Scholar
Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning (ICML).Google Scholar
Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372--387.Google ScholarCross Ref
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). ACM, 1--18.Google ScholarDigital Library
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In International Symposium on Software Testing and Analysis (ISSTA).Google ScholarDigital Library
O. Roeva, S. Fidanova, and M. Paprzycki. 2013. Influence of the population size on the genetic algorithm performance in case of cultivation process modelling. In 2013 Federated Conference on Computer Science and Information Systems. 371--376.Google Scholar
Patrice Y Simard, David Steinkraus, John C Platt, et al. 2003. Best practices for convolutional neural networks applied to visual document analysis.. In Proceedings of the Seventh International Conference on Document Analysis and Recognition(ICDAR), Vol. 3.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR).Google Scholar
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).Google Scholar
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 303--314.Google ScholarDigital Library
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICLR).Google Scholar
Website. 2019. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl Accessed: 2019-04-08.Google Scholar
Website. 2019. Cifar-10. https://github.com/BIGBALLON/cifar-10-cnn. Accessed: 2019-03-10.Google Scholar
Website. 2019. Cifar-10. https://github.com/yh1008/deepLearning. Accessed: 2019-03-10.Google Scholar
Website. 2019. Cifar-10. https://github.com/abars/YoloKerasFaceDetection. Accessed: 2019-03-10.Google Scholar
Website. 2019. Fashion-MNIST. https://github.com/umbertogrifo/Fashion-mnist-cnn-keras. Accessed: 2019-03-10.Google Scholar
Website. 2019. Fashion-MNIST. https://github.com/markjay4k/Fashion-MNIST-with-Keras. Accessed: 2019-03-10.Google Scholar
Website. 2019. GTSRB. https://github.com/chsasank/Trafc-Sign-Classification.keras. Accessed: 2018-10-30.Google Scholar
Website. 2019. GTSRB. https://github.com/xitizzz/Trafc-Sign-Recognition-using-Deep-Neural-Network. Accessed: 2018-10-30.Google Scholar
Qi Xin and Steven P Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 226--236.Google ScholarDigital Library
Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 789--799.Google ScholarDigital Library
Fanny Yang, Zuowen Wang, and Christina Heinze-Deml. 2019. Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness. In Advances in Neural Information Processing Systems (NIPS). 14757--14768.Google Scholar
Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2018. Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system. Empirical Software Engineering (2018), 1--35.Google Scholar
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In British Machine Vision Conference (BMVC).Google Scholar
Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 39--49.Google ScholarDigital Library
Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations (ICLR).Google Scholar
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 132--142.Google ScholarDigital Library
Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. 2003. Face recognition: A literature survey. ACM computing surveys (CSUR) 35, 4 (2003), 399--458.Google Scholar

Index Terms

Fuzz testing based data augmentation to improve robustness of deep neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

A Self-Supervised Feature Map Augmentation (FMA) Loss and Combined Augmentations Finetuning to Efficiently Improve the Robustness of CNNs
CSCS '20: Proceedings of the 4th ACM Computer Science in Cars Symposium

Deep neural networks are often not robust to semantically-irrelevant changes in the input. In this work we address the issue of robustness of state-of-the-art deep convolutional neural networks (CNNs) against commonly occurring distortions in the input ...
Read More
Rethinking data augmentation for adversarial robustness
Abstract
Recent work has proposed novel data augmentation methods to improve the adversarial robustness of deep neural networks. In this paper, we re-evaluate such methods through the lens of different metrics that characterize the augmented manifold, ...
Highlights
- Augmentation methods for adversarial robustness are often not tested in isolation.
- They are often tested on one single value of augmentation probability.
- They improve robustness only when combined with classical augmentations.
- ...
Read More
Data augmentation and semi-supervised learning for deep neural networks-based text classifier
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
June 2020
1640 pages
ISBN:9781450371216
DOI:10.1145/3377811
General Chairs:
Gregg Rothermel
North Carolina State University
,
Doo-Hwan Bae
KAIST, South Korea
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DNN
data augmentation
genetic algorithm
robustness
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 666
  Total Downloads
- Downloads (Last 12 months)261
- Downloads (Last 6 weeks)42
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fuzz testing based data augmentation to improve robustness of deep neural networks

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Self-Supervised Feature Map Augmentation (FMA) Loss and Combined Augmentations Finetuning to Efficiently Improve the Robustness of CNNs

Rethinking data augmentation for adversarial robustness

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fuzz testing based data augmentation to improve robustness of deep neural networks

ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Self-Supervised Feature Map Augmentation (FMA) Loss and Combined Augmentations Finetuning to Efficiently Improve the Robustness of CNNs

Rethinking data augmentation for adversarial robustness

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media