Abstract:
Recent research shows that 4-bit data precision is sufficient for Deep Neural Network (DNN) inference without accuracy degradation. Due to the low bit-width, a large amou...Show MoreMetadata
Abstract:
Recent research shows that 4-bit data precision is sufficient for Deep Neural Network (DNN) inference without accuracy degradation. Due to the low bit-width, a large amount of data is repeated. In this article, we propose a hardware architecture, named Rare Computing Architecture (RCA), that skips redundant computations due to repetitive data in the networks. By exploiting redundancy, RCA is not significantly affected by data-sparsity and maintains great improvements in performance and energy efficiency, while the improvements of existing DNN accelerators are vulnerable to variations in sparsity. In the RCA, repeated data in a window for censoring repetition are detected by a Redundancy Censoring Unit (RCU) and processed at a time, achieving high effective throughput. Additionally, we present a dataflow that exploits abundant data-reusability in DNNs, which enables the high-throughput computations to be ceaselessly performed without an increase of bandwidth for data-read. The proposed architecture is evaluated in two ways of exploiting weight- and activation-repetition. In the evaluation, RCA is compared to a value-agnostic computation and UCNN that is the state-of-the-art accelerator exploiting weight-repetition. Additionally, RCA is compared to Bit-pragmatic that exploits bit-level sparsity. Both evaluations demonstrate that the RCA shows steadily high improvements in performance and energy-efficiency.
Published in: IEEE Transactions on Computers ( Volume: 71, Issue: 4, 01 April 2022)