Abstract:
Accelerators for DNN inference in embedded applications using fixed-point arithmetic are attractive from the perspectives of hardware complexity and power consumption. Wh...Show MoreMetadata
Abstract:
Accelerators for DNN inference in embedded applications using fixed-point arithmetic are attractive from the perspectives of hardware complexity and power consumption. While techniques have been proposed for DNN inference with constrained values, these are typically accompanied by loss of inferencing accuracy. We propose instead an inferencing architecture predicated on tuning of weights with minimal impact on accuracy to facilitate sharing of shift and add operations across different weight computations in a multiplier-less manner. The highly nonlinear relationship between the distribution of ones and zeros in the binary encoding of multiplier weights is exploited to share shift-add operations. A systolic array architecture supporting such a computation paradigm is developed. Experimental results on hardware savings, power and latency tradeoffs are presented and demonstrate the benefits of the proposed scheme.
Date of Conference: 07-10 August 2022
Date Added to IEEE Xplore: 22 August 2022
ISBN Information: