skip to main content
10.1145/3377811.3380415acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Fuzz testing based data augmentation to improve robustness of deep neural networks

Published:01 October 2020Publication History

ABSTRACT

Deep neural networks (DNN) have been shown to be notoriously brittle to small perturbations in their input data. This problem is analogous to the over-fitting problem in test-based program synthesis and automatic program repair, which is a consequence of the incomplete specification, i.e., the limited tests or training examples, that the program synthesis or repair algorithm has to learn from. Recently, test generation techniques have been successfully employed to augment existing specifications of intended program behavior, to improve the generalizability of program synthesis and repair. Inspired by these approaches, in this paper, we propose a technique that re-purposes software testing methods, specifically mutation-based fuzzing, to augment the training data of DNNs, with the objective of enhancing their robustness. Our technique casts the DNN data augmentation problem as an optimization problem. It uses genetic search to generate the most suitable variant of an input data to use for training the DNN, while simultaneously identifying opportunities to accelerate training by skipping augmentation in many instances. We instantiate this technique in two tools, Sensei and Sensei-SA, and evaluate them on 15 DNN models spanning 5 popular image data-sets. Our evaluation shows that Sensei can improve the robust accuracy of the DNN, compared to the state of the art, on each of the 15 models, by upto 11.9% and 5.5% on average. Further, Sensei-SA can reduce the average DNN training time by 25%, while still improving robust accuracy.

References

  1. Rajeev Alur, Rishabh Singh, Dana Fisman, and Armando Solar-Lezama. 2018. Search-based Program Synthesis. Commun. ACM 61 (2018).Google ScholarGoogle Scholar
  2. Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. 2009. Robust optimization. Vol. 28. Princeton University Press.Google ScholarGoogle Scholar
  3. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. ArXiv preprint arXiv:1604.07316 (2016).Google ScholarGoogle Scholar
  4. Nicholas Carlini and David Wagner. 2017. Magnet and "efficient defenses against adversarial attacks" are not robust to adversarial examples. ArXiv preprint arXiv:1711.08478 (2017).Google ScholarGoogle Scholar
  5. Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.Google ScholarGoogle Scholar
  6. Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. AutoAugment: Learning Augmentation Strategies From Data. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  7. Pedro M Domingos. 2012. A few useful things to know about machine learning. Communication of the ACM 55, 10 (2012), 78--87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. 2019. Exploring the Landscape of Spatial Robustness. In International Conference on Machine Learning (ICML). 1802--1811.Google ScholarGoogle Scholar
  9. Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 420--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Xiang Gao, Sergey Mechtaev, and Abhik Roychoudhury. 2019. Crash-avoiding Program Repair. In ACM SIGSOFT International Symposium on Testing and Analysis (ISSTA).Google ScholarGoogle Scholar
  11. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS). 2672--2680.Google ScholarGoogle Scholar
  12. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  13. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ling Huang, Anthony D Joseph, Blaine Nelson, Benjamin IP Rubinstein, and JD Tygar. 2011. Adversarial machine learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence. ACM, 43--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jinhan Kim, Robert Feldt, and Shin Yoo. 2019. Guiding deep learning system testing using surprise adequacy. In Proceedings of the 41st International Conference on Software Engineering (ICSE). IEEE Press, 1039--1049.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS). 1097--1105.Google ScholarGoogle Scholar
  17. Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  18. Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Program Repair. Commun. ACM 62, 12 (2019).Google ScholarGoogle Scholar
  19. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). ACM, 120--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 175--186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  22. Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  23. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 372--387.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP). ACM, 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zichao Qi, Fan Long, Sara Achour, and Martin Rinard. 2015. An analysis of patch plausibility and correctness for generate-and-validate patch generation systems. In International Symposium on Software Testing and Analysis (ISSTA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Roeva, S. Fidanova, and M. Paprzycki. 2013. Influence of the population size on the genetic algorithm performance in case of cultivation process modelling. In 2013 Federated Conference on Computer Science and Information Systems. 371--376.Google ScholarGoogle Scholar
  27. Patrice Y Simard, David Steinkraus, John C Platt, et al. 2003. Best practices for convolutional neural networks applied to visual document analysis.. In Proceedings of the Seventh International Conference on Document Analysis and Recognition(ICDAR), Vol. 3.Google ScholarGoogle ScholarCross RefCross Ref
  28. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  29. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  30. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 303--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  32. Website. 2019. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl Accessed: 2019-04-08.Google ScholarGoogle Scholar
  33. Website. 2019. Cifar-10. https://github.com/BIGBALLON/cifar-10-cnn. Accessed: 2019-03-10.Google ScholarGoogle Scholar
  34. Website. 2019. Cifar-10. https://github.com/yh1008/deepLearning. Accessed: 2019-03-10.Google ScholarGoogle Scholar
  35. Website. 2019. Cifar-10. https://github.com/abars/YoloKerasFaceDetection. Accessed: 2019-03-10.Google ScholarGoogle Scholar
  36. Website. 2019. Fashion-MNIST. https://github.com/umbertogrifo/Fashion-mnist-cnn-keras. Accessed: 2019-03-10.Google ScholarGoogle Scholar
  37. Website. 2019. Fashion-MNIST. https://github.com/markjay4k/Fashion-MNIST-with-Keras. Accessed: 2019-03-10.Google ScholarGoogle Scholar
  38. Website. 2019. GTSRB. https://github.com/chsasank/Trafc-Sign-Classification.keras. Accessed: 2018-10-30.Google ScholarGoogle Scholar
  39. Website. 2019. GTSRB. https://github.com/xitizzz/Trafc-Sign-Recognition-using-Deep-Neural-Network. Accessed: 2018-10-30.Google ScholarGoogle Scholar
  40. Qi Xin and Steven P Reiss. 2017. Identifying test-suite-overfitted patches through test case generation. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 226--236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yingfei Xiong, Xinyuan Liu, Muhan Zeng, Lu Zhang, and Gang Huang. 2018. Identifying patch correctness in test-based program repair. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 789--799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Fanny Yang, Zuowen Wang, and Christina Heinze-Deml. 2019. Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness. In Advances in Neural Information Processing Systems (NIPS). 14757--14768.Google ScholarGoogle Scholar
  43. Zhongxing Yu, Matias Martinez, Benjamin Danglot, Thomas Durieux, and Martin Monperrus. 2018. Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system. Empirical Software Engineering (2018), 1--35.Google ScholarGoogle Scholar
  44. Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. In British Machine Vision Conference (BMVC).Google ScholarGoogle Scholar
  45. Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat. 2017. Efficient defenses against adversarial attacks. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. ACM, 39--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  47. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE). 132--142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wenyi Zhao, Rama Chellappa, P Jonathon Phillips, and Azriel Rosenfeld. 2003. Face recognition: A literature survey. ACM computing surveys (CSUR) 35, 4 (2003), 399--458.Google ScholarGoogle Scholar

Index Terms

  1. Fuzz testing based data augmentation to improve robustness of deep neural networks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
          June 2020
          1640 pages
          ISBN:9781450371216
          DOI:10.1145/3377811

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 October 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader