Loading [a11y]/accessibility-menu.js
Stuck-at Fault Tolerance in RRAM Computing Systems | IEEE Journals & Magazine | IEEE Xplore

Stuck-at Fault Tolerance in RRAM Computing Systems


Abstract:

Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficie...Show More

Abstract:

Emerging metal-oxide resistive switching random-access memory (RRAM) devices and RRAM crossbars have demonstrated their potential in boosting the speed and energy-efficiency of analog matrix-vector multiplication. However, due to the immature fabrication technology, commonly occurring Stuck-At-Faults (SAFs) seriously degrade the computational accuracy of an RRAM-based computing system (RCS). In this paper, we present a fault-tolerant framework for RCS. A mapping algorithm with inner fault tolerance is proposed to convert matrix parameters into RRAM conductances in RCS and tolerate SAFs by fully exploring the available mapping space. Two baseline redundancy schemes are proposed to ensure that RCS is effective when the percentage of faulty RRAM cells is high. To reduce the number of redundant RRAM cells when the SAFs follow a non-uniform distribution or an unknown distribution, a distribution-aware redundancy scheme and a re-configurable redundancy scheme are proposed to provide dynamic fault tolerance. Simulation results show that, the baseline redundancy schemes can improve the recognition accuracy of the MNIST data set to almost the same as the RRAM-fault-free case, with an energy overhead of approximately 30%. When SAFs follow a non-uniform and an unknown distribution, the distribution-aware and re-configurable schemes can reduce the number of redundant RRAM cells from more than 200% to less than 40% and 60%, respectively, without reducing the recognition accuracy.
Page(s): 102 - 115
Date of Publication: 23 November 2017

ISSN Information:

Funding Agency:


References

References is not available for this document.