ABSTRACT
Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.
- Rajeev Alur, Rishabh Singh, Dana Fisman, and Armando Solar-Lezama. 2018. Search-based Program Synthesis. Commun. ACM 61 (2018).Google Scholar
- Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. 2009. Robust optimization. Vol. 28. Princeton University Press.Google Scholar
- Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. ArXiv preprint arXiv:1604.07316 (2016).Google Scholar
- Nicholas Carlini and David Wagner. 2017. Magnet and "efficient defenses against adversarial attacks" are not robust to adversarial examples. ArXiv preprint arXiv:1711.08478 (2017).Google Scholar
- Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google Scholar
- Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Pedro M Domingos. 2012. A few useful things to know about machine learning. Communication of the ACM 55, 10 (2012), 78--87.Google ScholarDigital Library
- Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. 2019. Exploring the Landscape of Spatial Robustness. In International Conference on Machine Learning (ICML). 1802--1811.Google Scholar
- Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 420--435.Google ScholarDigital Library
- Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding Program Repair. In ACM SIGSOFT International Symposium on Testing and Analysis (ISSTA).Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS). 2672--2680.Google Scholar
- Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarCross Ref
- Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. ACM, 43--58.Google ScholarDigital Library
- Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE). IEEE Press, 1039--1049.Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS). 1097--1105.Google Scholar
- Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR).Google Scholar
- Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM 62, 12 (2019).Google Scholar
- Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). ACM, 120--131.Google ScholarDigital Library
- Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 175--186.Google ScholarDigital Library
- Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.Google Scholar
- Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning (ICML).Google Scholar
- Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372--387.Google ScholarCross Ref
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). ACM, 1--18.Google ScholarDigital Library
- Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In International Symposium on Software Testing and Analysis (ISSTA).Google ScholarDigital Library
- O. Roeva, S. Fidanova, and M. Paprzycki. 2013. Influence of the population size on the genetic algorithm performance in case of cultivation process modelling. In 2013 Federated Conference on Computer Science and Information Systems. 371--376.Google Scholar
- Patrice Y Simard, David Steinkraus, John C Platt, et al. 2003. Best practices for convolutional neural networks applied to visual document analysis.. In Proceedings of the Seventh International Conference on Document Analysis and Recognition(ICDAR), Vol. 3.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR).Google Scholar
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).Google Scholar
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 303--314.Google ScholarDigital Library
- Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICLR).Google Scholar
- Website. 2019. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl Accessed: 2019-04-08.Google Scholar
- Website. 2019. Cifar-10. https://github.com/BIGBALLON/cifar-10-cnn. Accessed: 2019-03-10.Google Scholar
- Website. 2019. Cifar-10. https://github.com/yh1008/deepLearning. Accessed: 2019-03-10.Google Scholar
- Website. 2019. Cifar-10. https://github.com/abars/YoloKerasFaceDetection. Accessed: 2019-03-10.Google Scholar
- Website. 2019. Fashion-MNIST. https://github.com/umbertogrifo/Fashion-mnist-cnn-keras. Accessed: 2019-03-10.Google Scholar
- Website. 2019. Fashion-MNIST. https://github.com/markjay4k/Fashion-MNIST-with-Keras. Accessed: 2019-03-10.Google Scholar
- Website. 2019. GTSRB. https://github.com/chsasank/Trafc-Sign-Classification.keras. Accessed: 2018-10-30.Google Scholar
- Website. 2019. GTSRB. https://github.com/xitizzz/Trafc-Sign-Recognition-using-Deep-Neural-Network. Accessed: 2018-10-30.Google Scholar
- Qi Xin and Steven P Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 226--236.Google ScholarDigital Library
- Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 789--799.Google ScholarDigital Library
- Fanny Yang, Zuowen Wang, and Christina Heinze-Deml. 2019. Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness. In Advances in Neural Information Processing Systems (NIPS). 14757--14768.Google Scholar
- Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2018. Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system. Empirical Software Engineering (2018), 1--35.Google Scholar
- Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In British Machine Vision Conference (BMVC).Google Scholar
- Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 39--49.Google ScholarDigital Library
- Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations (ICLR).Google Scholar
- Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 132--142.Google ScholarDigital Library
- Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. 2003. Face recognition: A literature survey. ACM computing surveys (CSUR) 35, 4 (2003), 399--458.Google Scholar
Index Terms
- Fuzz testing based data augmentation to improve robustness of deep neural networks
Recommendations
A Self-Supervised Feature Map Augmentation (FMA) Loss and Combined Augmentations Finetuning to Efficiently Improve the Robustness of CNNs
CSCS '20: Proceedings of the 4th ACM Computer Science in Cars SymposiumDeep neural networks are often not robust to semantically-irrelevant changes in the input. In this work we address the issue of robustness of state-of-the-art deep convolutional neural networks (CNNs) against commonly occurring distortions in the input ...
Rethinking data augmentation for adversarial robustness
AbstractRecent work has proposed novel data augmentation methods to improve the adversarial robustness of deep neural networks. In this paper, we re-evaluate such methods through the lens of different metrics that characterize the augmented manifold, ...
Highlights- Augmentation methods for adversarial robustness are often not tested in isolation.
- They are often tested on one single value of augmentation probability.
- They improve robustness only when combined with classical augmentations.
- ...
Data augmentation and semi-supervised learning for deep neural networks-based text classifier
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied ComputingUser feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of ...
Comments