skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Fault Injection for TensorFlow Applications

Journal Article · · IEEE Transactions on Dependable and Secure Computing

As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application’s resilience. In this work, we present TensorFI 1 and TensorFI 2, high-level fault injection (FI) frameworks for TensorFlow-based applications. TensorFI 1 and 2 are able to inject both hardware and software faults in any general TensorFlow 1 and 2 program respectively. Both are configurable FI tools that are flexible, easy to use, and portable. They can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., bit-flips in particular operations or layers). We use the TensorFI 1 and TensorFI 2 to evaluate the resilience of 12 and 10 ML programs written in TensorFlow, including DNNs used in the autonomous vehicle domain. The results give us insights into why some of the models are more resilient. We also measure the performance overheads of the two injectors, and present 4 case studies, two for each tool, to demonstrate their utility.

Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
AC05-76RL01830; 89233218CNA000001
OSTI ID:
1994506
Report Number(s):
PNNL-SA-161122; LA-UR-21-22618
Journal Information:
IEEE Transactions on Dependable and Secure Computing, Vol. 20, Issue 4; ISSN 1545-5971
Publisher:
IEEECopyright Statement
Country of Publication:
United States
Language:
English

References (24)

Cats and dogs conference June 2012
Audee conference December 2020
Deep learning library testing via effective model generation
  • Wang, Zan; Yan, Ming; Chen, Junjie
  • Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering https://doi.org/10.1145/3368089.3409761
conference November 2020
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications conference October 2020
PyTorchFI: A Runtime Perturbation Tool for DNNs conference June 2020
LFI: A practical and general library-level fault injector conference June 2009
NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors
  • Stott, D. T.; Floering, B.; Burke, D.
  • IPDS 2K: IEEE International Computer Performance and Dependability Symposium, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000 https://doi.org/10.1109/IPDS.2000.839467
conference January 2000
BinFI
  • Chen, Zitao; Li, Guanpeng; Pattabiraman, Karthik
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3295500.3356177
conference November 2019
Evaluating Fault Resiliency of Compressed Deep Neural Networks conference June 2019
Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data conference June 2018
Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults
  • Wei, Jiesheng; Thomas, Anna; Li, Guanpeng
  • 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) https://doi.org/10.1109/DSN.2014.2
conference June 2014
Understanding error propagation in deep learning neural network (DNN) accelerators and applications
  • Li, Guanpeng; Hari, Siva Kumar Sastry; Sullivan, Michael
  • Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1145/3126908.3126964
conference November 2017
ImageNet: A large-scale hierarchical image database
  • Deng, Jia; Dong, Wei; Socher, Richard
  • 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2009 IEEE Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR.2009.5206848
conference June 2009
Policy compression for aircraft collision avoidance systems conference September 2016
Fault injection techniques and tools journal April 1997
An empirical study of injected versus actual interface errors conference July 2014
DeepMutation: Mutation Testing of Deep Learning Systems conference October 2018
Ares conference June 2018
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey journal January 2018
DeepMutation++: A Mutation Testing Framework for Deep Learning Systems conference November 2019
LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults conference August 2015
DeepXplore conference October 2017
Automated robustness testing of off-the-shelf software components
  • Kropp, N. P.; Koopman, P. J.; Siewiorek, D. P.
  • Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224) https://doi.org/10.1109/FTCS.1998.689474
conference January 1998
Detection of traffic signs in real-world images: The German traffic sign detection benchmark conference August 2013