Abstract:
A high degree of sparsity in deep learning models is regarded as a great opportunity to achieve aggressive energy and delay savings in both convolutional neural networks ...Show MoreMetadata
Abstract:
A high degree of sparsity in deep learning models is regarded as a great opportunity to achieve aggressive energy and delay savings in both convolutional neural networks (e.g., sparsity: > 89% [1]) and Transformers (e.g., >75 [2]) by avoiding redundant computations. Despite this potential, major barriers deterring the exploration of sparsity are: 1) the unpredictable and unstructured nature of sparsity in the real-time input, and 2) the wide sparsity range across network models and layers within the same model. In this work, we present Sparsity Adaptive Dynamic Frequency Modulation (SA-DFM) based on real-time input sparsity in combination with the proposed sparsityadaptive processing elements (PE) in a 2D array. The sparsity record obtained from the output of the previous layer is exploited to modulate the frequency of the next layer to boost the performance by up to 1.8 times while fully utilizing the power budget. Unlike the sparsity-aware accelerators which gather non-zero elements via fine-grain control and lose efficacy with low or unstructured sparsity, the proposed work adjusts the frequency globally while maintaining the regular 2D array architecture with low (<7%) energy overhead to exploit both weight and activation sparsity for convolution and Transformers in the wide range (0-100%) of unstructured sparsity.
Published in: 2023 IEEE Custom Integrated Circuits Conference (CICC)
Date of Conference: 23-26 April 2023
Date Added to IEEE Xplore: 11 May 2023
ISBN Information: