Fault Injection for TensorFlow Applications
- University of British Columbia, Vancouver, BC (Canada)
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- University of Iowa, Iowa City, IA (United States)
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
As machine learning (ML) has seen increasing adoption in safety-critical domains (e.g., autonomous vehicles), the reliability of ML systems has also grown in importance. While prior studies have proposed techniques to enable efficient error-resilience (e.g., selective instruction duplication), a fundamental requirement for realizing these techniques is a detailed understanding of the application’s resilience. In this work, we present TensorFI 1 and TensorFI 2, high-level fault injection (FI) frameworks for TensorFlow-based applications. TensorFI 1 and 2 are able to inject both hardware and software faults in any general TensorFlow 1 and 2 program respectively. Both are configurable FI tools that are flexible, easy to use, and portable. They can be integrated into existing TensorFlow programs to assess their resilience for different fault types (e.g., bit-flips in particular operations or layers). We use the TensorFI 1 and TensorFI 2 to evaluate the resilience of 12 and 10 ML programs written in TensorFlow, including DNNs used in the autonomous vehicle domain. The results give us insights into why some of the models are more resilient. We also measure the performance overheads of the two injectors, and present 4 case studies, two for each tool, to demonstrate their utility.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- Grant/Contract Number:
- AC05-76RL01830; 89233218CNA000001
- OSTI ID:
- 1994506
- Report Number(s):
- PNNL-SA-161122; LA-UR-21-22618
- Journal Information:
- IEEE Transactions on Dependable and Secure Computing, Vol. 20, Issue 4; ISSN 1545-5971
- Publisher:
- IEEECopyright Statement
- Country of Publication:
- United States
- Language:
- English
Cats and dogs
|
conference | June 2012 |
Audee
|
conference | December 2020 |
Deep learning library testing via effective model generation
|
conference | November 2020 |
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications
|
conference | October 2020 |
PyTorchFI: A Runtime Perturbation Tool for DNNs
|
conference | June 2020 |
LFI: A practical and general library-level fault injector
|
conference | June 2009 |
NFTAPE: a framework for assessing dependability in distributed systems with lightweight fault injectors
|
conference | January 2000 |
BinFI
|
conference | November 2019 |
Evaluating Fault Resiliency of Compressed Deep Neural Networks
|
conference | June 2019 |
Hands Off the Wheel in Autonomous Vehicles?: A Systems Perspective on over a Million Miles of Field Data
|
conference | June 2018 |
Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults
|
conference | June 2014 |
Understanding error propagation in deep learning neural network (DNN) accelerators and applications
|
conference | November 2017 |
ImageNet: A large-scale hierarchical image database
|
conference | June 2009 |
Policy compression for aircraft collision avoidance systems
|
conference | September 2016 |
Fault injection techniques and tools
|
journal | April 1997 |
An empirical study of injected versus actual interface errors
|
conference | July 2014 |
DeepMutation: Mutation Testing of Deep Learning Systems
|
conference | October 2018 |
Ares
|
conference | June 2018 |
Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
|
journal | January 2018 |
DeepMutation++: A Mutation Testing Framework for Deep Learning Systems
|
conference | November 2019 |
LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults
|
conference | August 2015 |
DeepXplore
|
conference | October 2017 |
Automated robustness testing of off-the-shelf software components
|
conference | January 1998 |
Detection of traffic signs in real-world images: The German traffic sign detection benchmark
|
conference | August 2013 |
Similar Records
Exploring the Interplay of Resilience and Energy Consumption for a Task-Based Partial Differential Equations Preconditioner
Algorithm-Based Fault Tolerance for Convolutional Neural Networks