Abstract:
The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller ...Show MoreMetadata
Abstract:
The depthwise separable convolution, a key feature of the MobileNet models, has a different input reuse pattern from the conventional standard convolution, and a smaller number of input/weight pairs are used for a dot product, thereby leading to extremely low MAC utilization. This article proposes a Mobileware architecture for the high-performance acceleration of the MobileNet workloads. A new channel stationary dataflow architecture distributes the on-chip buffers, and the distributed SRAMs are placed near each processing element (PE). By doing so, PEs and SRAMs can communicate with high bandwidth. Our Mobileware architecture shows 1.4\times – 29.5\times higher throughput than conventional weight stationary-based hardware architecture, and the proposed design was verified on the Xilinx ZCU102 FPGA evaluation board.
Published in: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems ( Volume: 43, Issue: 9, September 2024)