Abstract:
ReRAM-based computing is good at accelerating convolutional neural network (CNN) inference due to its high computing parallelism, but its rigid crossbar structure may bec...Show MoreMetadata
Abstract:
ReRAM-based computing is good at accelerating convolutional neural network (CNN) inference due to its high computing parallelism, but its rigid crossbar structure may become less efficient in the face of the random data sparsity abundant in CNNs. In this study, we propose 3A -ReRAM, a novel crossbar architecture that can dynamically predict the accumulated results to enable adaptive activation accumulation, so that both zero and small values in feature map can be exploited in each matrix-vector multiplication (MVM) operation for speedup. To dynamically predict the results, we propose an efficient parallel predictor to find larger adapted boxes for increased computing parallelism without hurting accuracy. For a better scheduling between the dynamic predictions, we propose an efficient input window management with light-weight hardware support. With dynamic prediction and calculation, 3A -ReRAM architecture naturally fits the ReRAM crossbar structure but enables a totally different way to dynamically exploit the sparsity and small values in feature maps. It greatly improves the performance by increasing the computing parallelism and saves energy consumption by much less analog-digital conversions. The evaluation results show that 3A -ReRAM architecture can increase the performance by up to 13.03\times , 16.31\times , 2.46\times , and 2.58\times compared to ReRAM-based CNN accelerators ISAAC, PUMA (sparsity-unaware) and SRE, FORMS (sparsity-aware), and the total energy can be reduced by 8.93\times , 10.07\times , 2.97\times , and 4.58\times , respectively.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 1, January 2024)