skip to main content
10.1145/3378678.3391884acmconferencesArticle/Chapter ViewAbstractPublication PagesscopesConference Proceedingsconference-collections
short-paper

Cross-layer approaches for improving the dependability of deep learning systems

Published:25 May 2020Publication History

ABSTRACT

Deep Neural Networks (DNNs) - the state-of-the-art computational models for many Artificial Intelligence (AI) applications - are inherently compute and resource-intensive and, hence, cannot exploit traditional redundancy-based fault mitigation techniques for enhancing the dependability of DNN-based systems. Therefore, there is a dire need to search for alternate methods that can improve their reliability without high expenditure of resources by exploiting the intrinsic characteristics of these networks. In this paper, we present cross-layer approaches that, based on the intrinsic characteristics of DNNs, employ software and hardware-level modifications for improving the resilience of DNN-based systems to hardware-level faults, e.g., soft errors and permanent faults.

References

  1. M. Al-Qizwini et al. 2017. Deep learning algorithm for autonomous driving using GoogLeNet. In IEEE IV Symposium. 89--96.Google ScholarGoogle Scholar
  2. A. Azizimazreah et al. 2018. Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs. In IEEE NAS. 1--10.Google ScholarGoogle Scholar
  3. R. C. Baumann. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEE T-DMR 5, 3 (2005), 305--316.Google ScholarGoogle Scholar
  4. Y. Chen et al. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems (2019).Google ScholarGoogle Scholar
  5. Z. Chen et al. 2019. BinFI: an efficient fault injector for safety-critical machine learning systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Chen et al. 2020. Ranger: Boosting Error Resilience of Deep Neural Networks through Range Restriction. arXiv preprint arXiv.2003.13874 (2020).Google ScholarGoogle Scholar
  7. L-C Chu et al. 1990. Fault tolerant neural networks with hybrid redundancy. In IEEE IJCNN. IEEE, 639--649.Google ScholarGoogle Scholar
  8. A. Esteva et al. 2019. A guide to deep learning in healthcare. Nature medicine 25, 1 (2019), 24.Google ScholarGoogle Scholar
  9. M. Shafique et al. 2013. Exploiting program-level masking and error propagation for constrained reliability optimization. In Proceedings of the 50th Annual Design Automation Conference. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. I. Fawaz et al. 2019. Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33, 4 (2019), 917--963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Guo et al. 2014. Novel low-power and highly reliable radiation hardened memory cell for 65 nm CMOS technology. IEEE TCAS-I 61, 7 (2014), 1994--2001.Google ScholarGoogle Scholar
  12. Y. Guo et al. 2016. Deep learning for visual understanding: A review. Elsevier Neurocomputing 187 (2016), 27--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hanif at al. 2020. SalvageDNN: salvaging deep neural network accelerators with permanent faults through saliency-driven fault-aware mapping. Philosophical Transactions of the Royal Society A 378, 2164 (2020), 20190164.Google ScholarGoogle Scholar
  14. M. A. Hanif et al. 2018. Robust Machine Learning Systems: Reliability and Security for Deep Neural Networks. In IEEE IOLTS. 257--260.Google ScholarGoogle Scholar
  15. L. Hoang et al. 2019. FT-ClipAct: Resilience Analysis of Deep Neural Networks and Improving their Fault Tolerance using Clipped Activation. arXiv preprint arXiv:1912.00941 (2019).Google ScholarGoogle Scholar
  16. B. Huval et al. 2015. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716 (2015).Google ScholarGoogle Scholar
  17. N. P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In ACM/IEEE ISCA. 1--12.Google ScholarGoogle Scholar
  18. K. Kang et al. 2008. NBTI induced performance degradation in logic and memory circuits: How effectively can we approach a reliability solution?. In ACM/IEEE ASP-DAC. 726--731.Google ScholarGoogle Scholar
  19. S. Kim et al. 2018. Energy-efficient neural network acceleration in the presence of bit-level memory errors. IEEE TCAS-I 65, 12 (2018), 4285--4298.Google ScholarGoogle Scholar
  20. H. Kwon et al. 2018. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. In ACM ASPLOS. 461--475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. LeCun et al. 2015. Deep learning. Nature 521, 7553 (2015), 436.Google ScholarGoogle Scholar
  22. R. E. Lyons et al. 1962. The use of triple-modular redundancy to improve computer reliability. IBM journal of research and development 6, 2 (1962), 200--209.Google ScholarGoogle Scholar
  23. A. Marchisio et al. 2019. Deep Learning for Edge Computing: Current Trends, Cross-Layer Optimizations, and Open Research Challenges. In IEEE ISVLSI. 553--559.Google ScholarGoogle Scholar
  24. R. Miotto et al. 2018. Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics 19, 6 (2018), 1236--1246.Google ScholarGoogle Scholar
  25. S. Mozaffari et al. 2019. Deep Learning-based Vehicle Behaviour Prediction For Autonomous Driving Applications: A Review. arXiv preprint arXiv:1912.11676 (2019).Google ScholarGoogle Scholar
  26. M. Naseer et al. 2019. FANNet: Formal Analysis of Noise Tolerance, Training Bias and Input Sensitivity in Neural Networks. arXiv preprint arXiv:1912.01978 (2019).Google ScholarGoogle Scholar
  27. L. Palazzi et al. 2020. Improving the Accuracy of IR-level Fault Injection. IEEE TDSC (2020).Google ScholarGoogle Scholar
  28. B. S. Prabakaran et al. 2020. EMAP: A Cloud-Edge Hybrid Framework for EEG Monitoring and Cross-Correlation Based Real-time Anomaly Prediction. arXiv preprint arXiv:2004.10491 (2020).Google ScholarGoogle Scholar
  29. B. Reagen et al. 2018. Ares: A Framework for Quantifying the Resilience of Deep Neural Networks. In ACM/IEEE DAC. 17:1--17:6.Google ScholarGoogle Scholar
  30. S. Rehman et al. 2016. Reliable Software for Unreliable Hardware: A Cross Layer Perspective. Springer.Google ScholarGoogle Scholar
  31. M. Shafique et al. 2014. The EDA challenges in the dark silicon era: Temperature, reliability, and variability perspectives. In ACM/IEEE DAC. 1--6.Google ScholarGoogle Scholar
  32. M. Shafique et al. 2018. An overview of next-generation architectures for machine learning: Roadmap, opportunities and challenges in the IoT era. In IEEE DATE. 827--832.Google ScholarGoogle Scholar
  33. M. Shafique et al. 2020. Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead. IEEE D&T 37, 2 (2020), 30--57.Google ScholarGoogle ScholarCross RefCross Ref
  34. S. Shankland. [n.d.]. Meet Tesla's self-driving car computer and its two AI brains. https://www.cnet.com/news/meet-tesla-self-driving-car-computer-and-its-two-ai-brains/.Google ScholarGoogle Scholar
  35. V. Sze et al. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295--2329.Google ScholarGoogle ScholarCross RefCross Ref
  36. R. Vadlamani et al. 2010. Multicore soft error rate stabilization using adaptive dual modular redundancy. In IEEE DATE. 27--32.Google ScholarGoogle Scholar
  37. X. Vera et al. 2010. Selective replication: A lightweight technique for soft errors. ACM TOCS 27, 4 (2010), 1--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Zhang et al. 2018. ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators. arXiv preprint arXiv:1802.03806 (2018).Google ScholarGoogle Scholar
  39. J. J Zhang et al. 2018. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In IEEE VTS. IEEE, 1--6.Google ScholarGoogle Scholar
  40. K. Zhao et al. 2020. Algorithm-Based Fault Tolerance for Convolutional Neural Networks. arXiv preprint arXiv:2003.12203 (2020).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SCOPES '20: Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems
    May 2020
    96 pages
    ISBN:9781450371315
    DOI:10.1145/3378678
    • Editor:
    • Sander Stuijk,
    • General Chair:
    • Henk Corporaal

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 May 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • short-paper

    Acceptance Rates

    SCOPES '20 Paper Acceptance Rate8of13submissions,62%Overall Acceptance Rate38of79submissions,48%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader