Loading web-font TeX/Main/Regular
A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators | IEEE Journals & Magazine | IEEE Xplore

A Coordinated Model Pruning and Mapping Framework for RRAM-Based DNN Accelerators


Abstract:

Network sparsity or pruning is a pivotal technology for edge intelligence. Resistive random access memory (RRAM)-based accelerators, featuring dense storage and processin...Show More

Abstract:

Network sparsity or pruning is a pivotal technology for edge intelligence. Resistive random access memory (RRAM)-based accelerators, featuring dense storage and processing in memory capability, have demonstrated the superior computing performance and energy efficiency over the traditional CMOS-based accelerators for neural network applications. Unfortunately, RRAM-based accelerators suffer the performance or energy degradation when deploying pruned models, impairing their competition in the edge intelligence scenarios. We observed the essential reason is the pruning technology and the mapping strategy in prior RRAM-based accelerator and are optimized individually. As a result, the random zeros in the pruned deep neural network are irregularly distributed in the crossbars, rendering the degradation of computation parallelism of the crossbar without crossbar demand reduction. In this work, we propose a coordinated model pruning and mapping framework to jointly optimize of model accuracy and efficiency of RRAM-based accelerators. As for the mapping, we first decouple weight matrices in the bit-wise way and map the bit matrices to different crossbars, where the signed weights are represented with the two’s complement so as that save half desired crossbars. As for the pruning, we prune weight bits at the crossbar granularity so that free the crossbars holding the pruned bits. Furthermore, we employ an reinforcement learning (RL) approach to automatically select the optimal crossbar-aware bit-pruning strategy for any given neural network without laborious human efforts. We conducted the experiments on a set of representative neural networks and compared our framework with the state-of-the-art (SOTA) bit-sparsity works. The results show that automatic structured bit-pruning saves up to 89.64% energy reduction and 84.12% area overhead compared to existing PRIME-like architecture. Besides, our framework outperforms the SOTA bit-sparsity design by 1.5\times in terms of t...
Page(s): 2364 - 2376
Date of Publication: 14 November 2022

ISSN Information:

Funding Agency:


References

References is not available for this document.