Abstract:
Deep neural networks (DNNs) have demonstrated impressive performance in many edge computer vision tasks, causing the increasing demand for DNN accelerator on mobile and i...Show MoreMetadata
Abstract:
Deep neural networks (DNNs) have demonstrated impressive performance in many edge computer vision tasks, causing the increasing demand for DNN accelerator on mobile and internet of things (IoT) devices. However, the massive power consumption and storage requirement make the hardware design challenging. In this paper, we introduce a DNN accelerator based on a model compression technique vector quantization (VQ), which can reduce the network model size and computation cost simultaneously. Moreover, a specialized processing element (PE) is designed with various SRAM bank configurations as well as dataflows such that it can support different codebook/kernel sizes, and keep high utilization under small input or output channel numbers. Compared to the state-of-the-art, the proposed accelerator architecture achieves 4.2 times reduction in memory access and 2.05 times throughput per cycle for batch-one inference.
Date of Conference: 26-29 May 2019
Date Added to IEEE Xplore: 01 May 2019
Print ISBN:978-1-7281-0397-6
Print ISSN: 2158-1525