skip to main content
research-article

Bringing Engineering Rigor to Deep Learning

Published: 25 July 2019 Publication History

Abstract

Deep learning (DL) systems are increasingly deployed in safety- and security-critical domains including autonomous driving, robotics, and malware detection, where the correctness and predictability of a system on corner-case inputs are of great importance. Unfortunately, the common practice to validating a deep neural network (DNN) - measuring overall accuracy on a randomly selected test set - is not designed to surface corner-case errors. As recent work shows, even DNNs with state-of-the-art accuracy are easily fooled by human-imperceptible, adversarial perturbations to the inputs. Questions such as how to test corner-case behaviors more thoroughly and whether all adversarial samples have been found remain unanswered.
In the last few years, we have been working on bringing more engineering rigor into deep learning. Towards this goal, we have built five systems to test DNNs more thoroughly and verify the absence of adversarial samples for given datasets. These systems check a broad spectrum of properties (e.g., rotating an image should never change its classification) and find thousands of error-inducing samples for popular DNNs in critical domains (e.g., ImageNet, autonomous driving, and malware detection). Our DNN verifiers are also orders of magnitude (e.g., 5,000×) faster than similar tools. This article overviews our systems and discusses three open research challenges to hopefully inspire more future research towards testing and verifying DNNs.

References

[1]
2010. ImageNet crowdsourcing, benchmarking & other cool things. http://www.image-net.org/papers/ImageNet_2010.pdf.
[2]
2015. NASA, FAA, Industry Conduct Initial Sense-and-Avoid Test. https://www.nasa.gov/centers/armstrong/Features/acas_xu_paves_ the_way.html.
[3]
2016. Chauffeur model. https://github.com/udacity/self-driving-car/ tree/master/steering-models/community-models/chauffeur.
[4]
2016. Epoch model. https://github.com/udacity/self-driving-car/tree/ master/steering-models/community-models/cg23.
[5]
2016. Report on autonomous mode disengagements for waymo self-driving vehicles in california. https://www.dmv.ca.gov/ portal/wcm/connect/946b3502-c959--4e3b-b119--91319c27788f/ GoogleAutoWaymo_disengage_report_2016.pdf?MOD=AJPERES.
[6]
2017. Baidu Apollo Autonomous Driving Platform. https://github.com/ ApolloAuto/apollo.
[7]
2018. NAVAIR Plans to Install ACAS Xu on MQ- 4C Fleet. https://www.flightglobal.com/news/articles/ navair-plans-to-install-acas-xu-on-mq-4c-fleet-444989/.
[8]
amazon {n. d.}. Amazon Rekognition, deep learning-based image recognition search, verify, and organize millions of images. https: //aws.amazon.com/rekognition/.
[9]
Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In Pro- ceedings of the 21st Annual Network and Distributed System Security Symposium.
[10]
autopilot:dave 2016. Nvidia-Autopilot-Keras. https://github.com/ 0bserver07/Nvidia-Autopilot-Keras.
[11]
Thomas Ball and Sriram K Rajamani. 2002. The S LAM project: debugging system software via static analysis. In ACM SIGPLAN Notices, Vol. 37. ACM, 1--3.
[12]
Armin Biere, Alessandro Cimatti, Edmund Clarke, and Yunshan Zhu. 1999. Symbolic model checking without BDDs. In International con- ference on tools and algorithms for the construction and analysis of systems. Springer, 193--207.
[13]
Cara Bloom, Joshua Tan, Javed Ramjohn, and Lujo Bauer. 2017. Selfdriving cars and data collection: Privacy perceptions of networked autonomous vehicles. In Symposium on Usable Privacy and Security (SOUPS).
[14]
Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2017. Coverage-based greybox fuzzing as markov chain. IEEE Transactions on Software Engineering (2017).
[15]
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).
[16]
Cristian Cadar, Daniel Dunbar, Dawson R Engler, et al. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI, Vol. 8. 209--224.
[17]
Gong Cheng, Peicheng Zhou, and Junwei Han. 2016. RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
François Chollet. 2016. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv preprint arXiv:1610.02357 (2016).
[19]
clarifai 2013. Clarifai API: Large Scale Visual Recognition. https: //developer.clarifai.com/models/general-image-recognition-model/ aaa03c23b3724a16a56b629203edc62c.
[20]
Edmund M Clarke, Orna Grumberg, Marius Minea, and Doron Peled. 1999. State space reduction using partial order techniques. International Journal on Software Tools for Technology Transfer 2, 3 (1999), 279-- 287.
[21]
clone:dave 2016. Behavioral cloning: End-to-end learning for selfdriving cars. https://github.com/navoshta/behavioral-cloning.
[22]
clone:dave 2017. Rambo model for Udacity self-driving car challenge 2. https://github.com/udacity/self-driving-car/tree/master/ steering-models/community-models/rambo.
[23]
Taco Cohen and Max Welling. 2016. Group equivariant convolutional networks. In International conference on machine learning. 2990-- 2999.
[24]
contagio 2010. Contagio, PDF malware dump. http://contagiodump. blogspot.de/2010/08/malicious-documents-archive-for.html.
[25]
Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT- SIGPLAN symposium on Principles of programming languages. ACM, 238--252.
[26]
Luiz Henrique De Figueiredo and Jorge Stolfi. 2004. Affine arithmetic: concepts and applications. Numerical Algorithms 37, 1 (2004), 147-- 158.
[27]
Andrea Drmic, Marin Silic, Goran Delac, Klemo Vladimir, and Adrian S Kurdija. 2017. Evaluating robustness of perceptual image hashing algorithms. In 2017 40th International Convention on Informa- tion and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, 995--1000.
[28]
Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2018. Output range analysis for deep feedforward neural networks. In NASA Formal Methods Symposium. Springer, 121--138.
[29]
Mahyar Fazlyab, Manfred Morari, and George J Pappas. 2019. Safety Verification and Robustness Analysis of Neural Networks via Quadratic Constraints and Semidefinite Programming. arXiv preprint arXiv:1903.01287 (2019).
[30]
Cormac Flanagan and Patrice Godefroid. 2005. Dynamic partialorder reduction for model checking software. In ACM Sigplan Notices, Vol. 40. ACM, 110--121.
[31]
Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. 2018. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP). IEEE, 3--18.
[32]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of the 3rd International Conference on Learning Representations. http: //arxiv.org/abs/1412.6572
[33]
google-accident 2016. A Google self-driving car caused a crash for the first time. http://www.theverge.com/2016/2/29/11134344/ google-self-driving-car-crash-report.
[34]
google-vision-api 2011. Cloud Vision API - Derive insight from images with our powerful Cloud Vision API. https://cloud.google.com/vision/.
[35]
Divya Gopinath, Kaiyuan Wang, Mengshi Zhang, Corina S Pasareanu, and Sarfraz Khurshid. 2018. Symbolic execution for deep neural networks. arXiv preprint arXiv:1807.10439 (2018).
[36]
Kathrin Grosse, Nicolas Papernot, Praveen Manoharan, Michael Backes, and Patrick McDaniel. 2016. Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435 (2016).
[37]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. 770-- 778.
[38]
Thomas A Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy abstraction. ACM SIGPLAN Notices 37, 1 (2002), 58--70.
[39]
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (2012), 82--97.
[40]
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. (2017).
[41]
Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 3.
[42]
ibm {n. d.}. IBM Watson Visual Recognition Service. https://www.ibm. com/watson/developercloud/doc/visual-recognition/index.html.
[43]
Kyle D Julian, Jessica Lopez, Jeffrey S Brush, Michael P Owen, and Mykel J Kochenderfer. 2016. Policy compression for aircraft collision avoidance systems. In Proceedings of the 35th IEEE/AIAA Digital Avionics Systems Conference.
[44]
Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Proceedings of the 29th International Conference On Computer Aided Verification.
[45]
James C King. 1976. Symbolic execution and program testing. Com- mun. ACM 19, 7 (1976), 385--394.
[46]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Informa- tion Processing Systems.
[47]
Yann LeCun, Corinna Cortes, and Christopher JC Burges. 2010. MNIST handwritten digit database. AT&T Labs {Online}. Available: http://yann.lecun.com/exdb/mnist 2 (2010).
[48]
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Advances in Neural Informa- tion Processing Systems. 700--708.
[49]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, et al. 2018. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceed- ings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120--131.
[50]
Lei Ma, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Felix Juefei- Xu, Chao Xie, Li Li, Yang Liu, Jianjun Zhao, et al. 2018. Deepmutation: Mutation testing of deep learning systems. In 2018 IEEE 29th Interna- tional Symposium on Software Reliability Engineering (ISSRE). IEEE, 100--111.
[51]
Lei Ma, Fuyuan Zhang, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. Combinatorial testing for deep learning systems. arXiv preprint arXiv:1806.07723 (2018).
[52]
Shiqing Ma, Yingqi Liu, Wen-Chuan Lee, Xiangyu Zhang, and Ananth Grama. 2018. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Con- ference and Symposium on the Foundations of Software Engineering. ACM, 175--186.
[53]
Mike Marston and Gabe Baca. 2015. ACAS-Xu initial self-separation flight tests. NASA Technical Reports Server (2015).
[54]
microsoft {n. d.}. Microsoft Computer Vision API. https://azure. microsoft.com/en-us/services/cognitive-services/computer-vision/.
[55]
Matthew Mirman, Timon Gehr, and Martin Vechev. 2018. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning. 3575--3583.
[56]
Anh Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the 28th IEEE Conference on Computer Vi- sion and Pattern Recognition.
[57]
MIT Tech Notes. 2015. Airborne Collision Avoidance System X. MIT Lincoln Laboratory (2015).
[58]
Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875 (2018).
[59]
OpenAI. 2018. OpenAI Five. https://blog.openai.com/openai-five/.
[60]
pdfrate 2012. PDFRate, A machine learning based classifier operating on document metadata and structure. http://pdfrate.com/.
[61]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Pro- ceedings of the 26th ACM Symposium on Operating Systems Principles.
[62]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Towards practical verification of machine learning: The case of computer vision systems. arXiv preprint arXiv:1712.01785 (2017).
[63]
Michael J. Cloud Ramon E. Moore, R. Baker Kearfott. 2009. Introduc- tion to Interval Analysis. SIAM.
[64]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (NIPS'15). MIT Press, Cambridge, MA, USA, 91--99. http://dl.acm.org/citation.cfm? id=2969239.2969250
[65]
Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection using two dimensional binary program features. In 2015 10th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, 11--20.
[66]
Edward J Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In 2010 IEEE Symposium on Security and Privacy. IEEE, 317--331.
[67]
Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 263--272.
[68]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. Nature 550, 7676 (2017), 354.
[69]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[70]
Gagandeep Singh, Timon Gehr, Matthew Mirman, Markus Püschel, and Martin Vechev. 2018. Fast and effective robustness certification. In Advances in Neural Information Processing Systems. 10802--10813.
[71]
Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. 2019. An abstract domain for certifying neural networks. Proceedingsof the ACM on Programming Languages 3, POPL (2019), 41.
[72]
Charles Smutz and Angelos Stavrou. 2012. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference.
[73]
Michael Spreitzenbarth, Felix Freiling, Florian Echtler, Thomas Schreck, and Johannes Hoffmann. 2013. Mobile-sandbox: having a deeper look into android applications. In Proceedings of the 28th Annual ACM Symposium on Applied Computing.
[74]
Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic Testing for Deep Neural Networks. In Automated Software Engineering (ASE). ACM, 109--119.
[75]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.
[76]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition.
[77]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In Proceedings of the 2nd International Conference on Learning Representations.
[78]
tesla-accident 2016. Understanding the fatal Tesla accident on Autopilot and the NHTSA probe. https://electrek.co/2016/07/01/ understanding-fatal-tesla-accident-autopilot-nhtsa-probe/.
[79]
tesla-accident-2019 2019. Understanding the fatal Tesla accident on Autopilot and the NHTSA probe. https://abcnews.go.com/Politics/ teslas-autopilot-engaged-fatal-florida-crash-ntsb/story?id=63107290.
[80]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineer- ing. ACM, 303--314.
[81]
Vincent Tjeng, Kai Y. Xiao, and Russ Tedrake. 2019. Evaluating Robustness of Neural Networks with Mixed Integer Programming. In International Conference on Learning Representations. https:// openreview.net/forum?id=HyGIdiRqtm
[82]
virustotal 2004. VirusTotal, a free service that analyzes suspicious files and URLs and facilitates the quick detection of viruses, worms, trojans, and all kinds of malware. https://www.virustotal.com/.
[83]
visualize:dave 2016. Visualizations for understanding the regressed wheel steering angle for self driving cars. https://github.com/jacobgil/ keras-steering-angle-visualizations.
[84]
Nedim rndic and Pavel Laskov. 2014. Practical evasion of a learningbased classifier: a case study. In Proceedings of the 35th IEEE Sympo- sium on Security and Privacy.
[85]
Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. 2018. Efficient formal safety analysis of neural networks. In Advances in Neural Information Processing Systems. 6367--6377.
[86]
Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. 2018. Formal security analysis of neural networks using symbolic intervals. In 27th {USENIX} Security Symposium ({USENIX} Security 18). 1599--1614.
[87]
Xiaolong Wang, Abhinav Shrivastava, and Abhinav Gupta. 2017. AFast- RCNN: Hard Positive Generation via Adversary for Object Detection. In Conference on Computer Vision and Pattern Recognition (CVPR).
[88]
Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. 2018. Scaling provable adversarial defenses. In Advances in Neural Information Processing Systems. 8400--8409.
[89]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Hongxu Chen, Minhui Xue, Bo Li, Yang Liu, Jianjun Zhao, Jianxiong Yin, and Simon See. 2018. Coverage-Guided Fuzzing for Deep Neural Networks. arXiv preprint arXiv:1809.01266 (2018).
[90]
Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 (2016).
[91]
Weilin Xu, Yanjun Qi, and David Evans. 2016. Automatically evading classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium.
[92]
Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-sec: deep learning in android malware detection. In ACM SIG- COMM Computer Communication Review.
[93]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. Deeproad: Gan-based metamorphic autonomous driving system testing. arXiv preprint arXiv:1802.02295 (2018).
[94]
Zhengli Zhao, Dheeru Dua, and Sameer Singh. 2018. Generating Natural Adversarial Examples. In International Conference on Learning Representations (ICLR).
[95]
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2017. Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012 (2017). 67

Cited By

View all
  • (2023)The National Airworthiness Council artificial intelligence working group (NACAIWG) summit proceedings 2022Systems Engineering10.1002/sys.2170326:6(925-930)Online publication date: 8-Jun-2023
  • (2021)An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)10.1109/ICSE43902.2021.00033(238-250)Online publication date: May-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 53, Issue 1
July 2019
90 pages
ISSN:0163-5980
DOI:10.1145/3352020
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019
Published in SIGOPS Volume 53, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The National Airworthiness Council artificial intelligence working group (NACAIWG) summit proceedings 2022Systems Engineering10.1002/sys.2170326:6(925-930)Online publication date: 8-Jun-2023
  • (2021)An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)10.1109/ICSE43902.2021.00033(238-250)Online publication date: May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media