ABSTRACT
Deep Neural Networks (DNNs) - the state-of-the-art computational models for many Artificial Intelligence (AI) applications - are inherently compute and resource-intensive and, hence, cannot exploit traditional redundancy-based fault mitigation techniques for enhancing the dependability of DNN-based systems. Therefore, there is a dire need to search for alternate methods that can improve their reliability without high expenditure of resources by exploiting the intrinsic characteristics of these networks. In this paper, we present cross-layer approaches that, based on the intrinsic characteristics of DNNs, employ software and hardware-level modifications for improving the resilience of DNN-based systems to hardware-level faults, e.g., soft errors and permanent faults.
- M. Al-Qizwini et al. 2017. Deep learning algorithm for autonomous driving using GoogLeNet. In IEEE IV Symposium. 89--96.Google Scholar
- A. Azizimazreah et al. 2018. Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs. In IEEE NAS. 1--10.Google Scholar
- R. C. Baumann. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEE T-DMR 5, 3 (2005), 305--316.Google Scholar
- Y. Chen et al. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2019).Google Scholar
- Z. Chen et al. 2019. BinFI: an efficient fault injector for safety-critical machine learning systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 69.Google ScholarDigital Library
- Z. Chen et al. 2020. Ranger: Boosting Error Resilience of Deep Neural Networks through Range Restriction. arXiv preprint arXiv.2003.13874 (2020).Google Scholar
- L-C Chu et al. 1990. Fault tolerant neural networks with hybrid redundancy. In IEEE IJCNN. IEEE, 639--649.Google Scholar
- A. Esteva et al. 2019. A guide to deep learning in healthcare. Nature medicine 25, 1 (2019), 24.Google Scholar
- M. Shafique et al. 2013. Exploiting program-level masking and error propagation for constrained reliability optimization. In Proceedings of the 50th Annual Design Automation Conference. 1--9.Google ScholarDigital Library
- H. I. Fawaz et al. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33, 4 (2019), 917--963.Google ScholarDigital Library
- J. Guo et al. 2014. Novel low-power and highly reliable radiation hardened memory cell for 65 nm CMOS technology. IEEE TCAS-I 61, 7 (2014), 1994--2001.Google Scholar
- Y. Guo et al. 2016. Deep learning for visual understanding: A review. Elsevier Neurocomputing 187 (2016), 27--48.Google ScholarDigital Library
- M. Hanif at al. 2020. SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping. Philosophical Transactions of the Royal Society A 378, 2164 (2020), 20190164.Google Scholar
- M. A. Hanif et al. 2018. Robust Machine Learning Systems: Reliability and Security for Deep Neural Networks. In IEEE IOLTS. 257--260.Google Scholar
- L. Hoang et al. 2019. FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation. arXiv preprint arXiv:1912.00941 (2019).Google Scholar
- B. Huval et al. 2015. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716 (2015).Google Scholar
- N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In ACM/IEEE ISCA. 1--12.Google Scholar
- K. Kang et al. 2008. NBTI induced performance degradation in logic and memory circuits: How effectively can we approach a reliability solution?. In ACM/IEEE ASP-DAC. 726--731.Google Scholar
- S. Kim et al. 2018. Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE TCAS-I 65, 12 (2018), 4285--4298.Google Scholar
- H. Kwon et al. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. In ACM ASPLOS. 461--475.Google ScholarDigital Library
- Y. LeCun et al. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google Scholar
- R. E. Lyons et al. 1962. The use of triple-modular redundancy to improve computer reliability. IBM journal of research and development 6, 2 (1962), 200--209.Google Scholar
- A. Marchisio et al. 2019. Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges. In IEEE ISVLSI. 553--559.Google Scholar
- R. Miotto et al. 2018. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics 19, 6 (2018), 1236--1246.Google Scholar
- S. Mozaffari et al. 2019. Deep Learning-based Vehicle Behaviour Prediction For Autonomous Driving Applications: A Review. arXiv preprint arXiv:1912.11676 (2019).Google Scholar
- M. Naseer et al. 2019. FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks. arXiv preprint arXiv:1912.01978 (2019).Google Scholar
- L. Palazzi et al. 2020. Improving the Accuracy of IR-level Fault Injection. IEEE TDSC (2020).Google Scholar
- B. S. Prabakaran et al. 2020. EMAP: A Cloud-Edge Hybrid Framework for EEG Monitoring and Cross-Correlation Based Real-time Anomaly Prediction. arXiv preprint arXiv:2004.10491 (2020).Google Scholar
- B. Reagen et al. 2018. Ares: A Framework for Quantifying the Resilience of Deep Neural Networks. In ACM/IEEE DAC. 17:1--17:6.Google Scholar
- S. Rehman et al. 2016. Reliable Software for Unreliable Hardware: A Cross Layer Perspective. Springer.Google Scholar
- M. Shafique et al. 2014. The EDA challenges in the dark silicon era: Temperature, reliability, and variability perspectives. In ACM/IEEE DAC. 1--6.Google Scholar
- M. Shafique et al. 2018. An overview of next-generation architectures for machine learning: Roadmap, opportunities and challenges in the IoT era. In IEEE DATE. 827--832.Google Scholar
- M. Shafique et al. 2020. Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead. IEEE D&T 37, 2 (2020), 30--57.Google ScholarCross Ref
- S. Shankland. [n.d.]. Meet Tesla's self-driving car computer and its two AI brains. https://www.cnet.com/news/meet-tesla-self-driving-car-computer-and-its-two-ai-brains/.Google Scholar
- V. Sze et al. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295--2329.Google ScholarCross Ref
- R. Vadlamani et al. 2010. Multicore soft error rate stabilization using adaptive dual modular redundancy. In IEEE DATE. 27--32.Google Scholar
- X. Vera et al. 2010. Selective replication: A lightweight technique for soft errors. ACM TOCS 27, 4 (2010), 1--30.Google ScholarDigital Library
- J. Zhang et al. 2018. ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators. arXiv preprint arXiv:1802.03806 (2018).Google Scholar
- J. J Zhang et al. 2018. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In IEEE VTS. IEEE, 1--6.Google Scholar
- K. Zhao et al. 2020. Algorithm-Based Fault Tolerance for Convolutional Neural Networks. arXiv preprint arXiv:2003.12203 (2020).Google Scholar
Recommendations
Understanding error propagation in deep learning neural network (DNN) accelerators and applications
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisDeep learning neural networks (DNNs) have been successful in solving a wide range of machine learning problems. Specialized hardware accelerators have been proposed to accelerate the execution of DNN algorithms for high-performance and energy ...
A cross-layer approach towards developing efficient embedded Deep Learning systems
AbstractWith the evolution of Smart Cyber–Physical Systems (CPS) and Internet-of-Things (IoT), the number of connected (intelligent) devices is increasing at an exponential rate, and so as the data being produced by them. To process this ...
Dependability Analysis of Fault Tolerant Systems Based on Partial Dynamic Reconfiguration Implemented into FPGA
DSD '12: Proceedings of the 2012 15th Euromicro Conference on Digital System DesignIn this paper, a dependability analysis of fault tolerant systems implemented into the SRAM-based FPGA is presented. The fault tolerant architectures are based on the redundancy of functional units associated with a concurrent error detection technique ...
Comments