Abstract:
Depthwise Separable Convolution (DSC) operations are fundamental to efficient deep neural network (DNN) architectures such as MobileNet and MobileNetV2, enabling signific...Show MoreMetadata
Abstract:
Depthwise Separable Convolution (DSC) operations are fundamental to efficient deep neural network (DNN) architectures such as MobileNet and MobileNetV2, enabling significant reductions in computational load compared to standard convolution. However, hardware accelerators optimized for standard convolution often experience decreased Processing Element (PE) utilization when executing DSC operations, primarily due to the increased number of output feature maps in depthwise (DW) convolutions and associated memory bandwidth constraints. In this paper, we propose an optimized architecture and data flow for DSC operations that enhances PE utilization in convolution-based accelerators. By designing a data flow that computes DW and pointwise (PW) convolutions consecutively without storing intermediate DW results in memory, we eliminate bandwidth limitations and improve computational efficiency. The proposed PE architecture incorporates both standard convolution and DW operation modes with minimal hardware modifications, allowing efficient processing of DSC operations. Experimental results demonstrate that implementing the proposed PE design on hardware with 1,024 PEs operating at 100 MHz increases the throughput of MobileNetV2 from 12 frames per second to 139 frames per second, achieving a 10X improvement over existing hardware.
Date of Conference: 19-22 January 2025
Date Added to IEEE Xplore: 18 February 2025
ISBN Information: