skip to main content
10.1145/3316781.3317742acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

A Fault-Tolerant Neural Network Architecture

Published: 02 June 2019 Publication History

Abstract

New DNN accelerators based on emerging technologies, such as resistive random access memory (ReRAM), are gaining increasing research attention given their potential of "in-situ" data processing. Unfortunately, device-level physical limitations that are unique to these technologies may cause weight disturbance in memory and thus compromising the performance and stability of DNN accelerators. In this work, we propose a novel fault-tolerant neural network architecture to mitigate the weight disturbance problem without involving expensive retraining. Specifically, we propose a novel collaborative logistic classifier to enhance the DNN stability by redesigning the binary classifiers augmented from both traditional error correction output code (ECOC) and modern DNN training algorithm. We also develop an optimized variable-length "decode-free" scheme to further boost the accuracy under fewer number of classifiers. Experimental results on cutting-edge DNN models and complex datasets show that the proposed fault-tolerant neural network architecture can effectively rectify the accuracy degradation against weight disturbance for DNN accelerators with low cost, thus allowing for its deployment in a variety of mainstream DNNs.

References

[1]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
[2]
Fabien Alibart, Ligang Gao, Brian D Hoskins, and Dmitri B Strukov. 2012. High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23, 7 (2012), 075201.
[3]
Adam Berger. 1999. Error-correcting output coding for text classification. In IJCAI-99: Workshop on machine learning for information filtering.
[4]
Ting Chang, Sung-Hyun Jo, and Wei Lu. 2011. Short-term memory to long-term memory transition in a nanoscale memristor. ACS nano 5, 9 (2011), 7669--7676.
[5]
Lerong Chen, Jiawen Li, Yiran Chen, Qiuping Deng, Jiyuan Shen, Xiaoyao Liang, and Li Jiang. 2017. Accelerator-friendly neural-network training: learning variations and defects in RRAM crossbar. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 19--24.
[6]
Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105--112.
[7]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 248--255.
[8]
Thomas G Dietterich and Ghulum Bakiri. 1995. Solving multiclass learning problems via error-correcting output codes. Journal of artificial intelligence research 2 (1995), 263--286.
[9]
Ben Feinberg, Shibo Wang, and Engin Ipek. 2018. Making memristive neural network accelerators reliable. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 52--65.
[10]
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[11]
Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016).
[12]
Yan-huang Jiang, Qiang-li Zhao, and Xue-jun Yang. 2004. A general coding method for error-correcting output codes. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 648--652.
[13]
Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Advances in neural information processing systems. 598--605.
[14]
Chenchen Liu, Miao Hu, John Paul Strachan, and Hai Li. 2017. Rescuing memristor-based neuromorphic design with high defects. In Design Automation Conference (DAC), 2017 54th ACM/EDAC/IEEE. IEEE, 1--6.
[15]
Tao Liu, Lei Jiang, Yier Jin, Gang Quan, and Wujie Wen. 2018. PT-spike: a precise-time-dependent single spike neuromorphic architecture with efficient supervised learning. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference. IEEE Press, 568--573.
[16]
Tao Liu, Zihao Liu, Fuhong Lin, Yier Jin, Gang Quan, and Wujie Wen. 2017. MT-spike: a multilayer time-based spiking neuromorphic architecture with temporal error backpropagation. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 450--457.
[17]
Gilberto Medeiros-Ribeiro, Frederick Perner, Richard Carter, Hisham Abdalla, Matthew D Pickett, and R Stanley Williams. 2011. Lognormal switching times for titanium dioxide bipolar memristors: origin and resolution. Nanotechnology 22, 9 (2011), 095702.
[18]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.
[19]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.
[20]
Patrice Y Simard, Dave Steinkraus, and John C Platt. 2003. Best practices for convolutional neural networks applied to visual document analysis. In null. IEEE, 958.
[21]
Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. 2017. PipeLayer: A pipelined ReRAM-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 541--552.
[22]
Christian Szegedy. 2016. An Overview of Deep Learning. AITP 2016 (2016).
[23]
Bonan Yan, Jianhua Joshua Yang, Qing Wu, Yiran Chen, and Hai Helen Li. 2017. A closed-loop design to enhance weight stability of memristor based neural network chips. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 541--548.
[24]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320--3328.
[25]
Bin Zhao and Eric P Xing. 2016. Sparse output coding for scalable visual recognition. International Journal of Computer Vision 119, 1 (2016), 60--75.

Cited By

View all
  • (2025)Layer ensemble averaging for fault tolerance in memristive neural networksNature Communications10.1038/s41467-025-56319-616:1Online publication date: 1-Feb-2025
  • (2024)CorrectNet+: Dealing With HW Non-Idealities in In-Memory-Computing Platforms by Error Suppression and CompensationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331308943:2(573-585)Online publication date: Feb-2024
  • (2024)Design for dependability — State of the art and trendsJournal of Systems and Software10.1016/j.jss.2024.111989(111989)Online publication date: Feb-2024
  • Show More Cited By
  1. A Fault-Tolerant Neural Network Architecture

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
    June 2019
    1378 pages
    ISBN:9781450367257
    DOI:10.1145/3316781
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Layer ensemble averaging for fault tolerance in memristive neural networksNature Communications10.1038/s41467-025-56319-616:1Online publication date: 1-Feb-2025
    • (2024)CorrectNet+: Dealing With HW Non-Idealities in In-Memory-Computing Platforms by Error Suppression and CompensationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.331308943:2(573-585)Online publication date: Feb-2024
    • (2024)Design for dependability — State of the art and trendsJournal of Systems and Software10.1016/j.jss.2024.111989(111989)Online publication date: Feb-2024
    • (2023)Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software Co-DesignNeuromorphic Computing10.5772/intechopen.111963Online publication date: 15-Nov-2023
    • (2023)COLAProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3620092(40277-40289)Online publication date: 23-Jul-2023
    • (2023)A Design Methodology for Fault-Tolerant Neuromorphic Computing Using Bayesian Neural NetworkMicromachines10.3390/mi1410184014:10(1840)Online publication date: 27-Sep-2023
    • (2023)Programming Techniques of Resistive Random-Access Memory Devices for Neuromorphic ComputingElectronics10.3390/electronics1223480312:23(4803)Online publication date: 27-Nov-2023
    • (2023)Resilience and Resilient Systems of Artificial Intelligence: Taxonomy, Models and MethodsAlgorithms10.3390/a1603016516:3(165)Online publication date: 18-Mar-2023
    • (2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
    • (2023)BETTER: Bayesian-Based Training and Lightweight Transfer Architecture for Reliable and High-Speed Memristor Neural Network DeploymentIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.323147170:6(1846-1850)Online publication date: Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media