Abstract
Lightweight convolutional neural networks (CNNs) reduce computational workloads, making them suitable for embedded devices with limited hardware resources compared to conventional CNNs. Depthwise separable convolution (DSC) serves as the fundamental convolution unit of lightweight CNNs. This paper introduces a hardware accelerator tailored for DSC in Application-Specific Integrated Circuit (ASIC), featuring a unified engine supporting both depthwise convolution (DWC) and pointwise convolution (PWC) with high hardware utilization. It ensures 100% processing element (PE) array utilization for DWC and achieves up to 98% utilization for PWC while minimizing latency. By partitioning the input feature map (ifmap) Static Random-Access Memory (SRAM) into three banks, memory access is streamlined. Furthermore, a data scheduling strategy, along with a multiplexed registers (MR) bank based First-In-First-Out (FIFO) system between adjacent PEs, is implemented to maximize data reuse and reduce latency. This work is implemented in a 22 nm FDSOI technology and validated on the CIFAR10 dataset using the MobileNetV1 architecture. The proposed DSC accelerator can operate at 1 GHz, exhibiting an energy efficiency of 5.07 (3.96) TOPS/W and an area efficiency of 519.2 (461.52) GOPS/mm\(^{2}\) for DWC (PWC) at 0.8 V. Scaling the supply voltage down to 0.5 V increases the energy efficiency to 13.64 TOPS/W for DWC and 10.64 TOPS/W for PWC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. Boston, MA, USA (2015)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. Honolulu, HI, USA (2017)
Howard, AG., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520. Salt Lake City, UT, USA (2018)
Howard, A., et al.: Searching for MobileNetV3. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848-6856. Salt Lake City, UT, USA (2018)
Tan, M.X., Quoc, V.Le.: EfficientNet: rethinking model scaling for convolutional neural networks (2019)
Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems, Association of Computational Machinery (2017)
Gulati, A., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Proceedings of Annual Conference on International Speech Communication Association, pp. 5036–5040 (2020)
Chen, Yu-Hsin., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. (JSSC), 127–138 (2017)
Yue, J.S, et al.: A 3.77 TOPS/W convolutional neural network processor with priority-driven kernel optimization. IEEE Trans. Circ. Syst. II: Express Briefs, 277–281 (2019)
Chang, K.W., et al.: VWA: hardware efficient vectorwise accelerator for convolutional neural network. IEEE Trans. Circu. Syst. I: Regular Papers, 145–154 (2020)
Tu, F.B., et al.: Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2220–2233 (2017)
Anders, M.A., et al.: 2.9 TOPS/W reconfigurable dense/sparse matrix-multiply accelerator with unified INT8/INTI6/FP16 Datapath in 14NM Tri-Gate CMOS. In: IEEE Symposium on VLSI Circuits, pp. 39–40 (2018)
Kim, H., et al.: Row-streaming dataflow using a chaining buffer and systolic array+ structure. In: IEEE Computer Architecture Letters, pp. 34–37 (2021)
Wu, X., Ma, Y., Wang, Z.: Efficient inference of large-scale and lightweight convolutional neural networks on FPGA. In: 2020 IEEE 33rd International System-on-Chip Conference (SOCC), pp. 168–173. Las Vegas, NV, USA (2020)
An, F., et al.: A high performance reconfigurable hardware architecture for lightweight convolutional neural network. Electronics (2023)
Huang, J., Liu, X., Guo, T., Zhao, Z.: A high-performance FPGA-based depthwise separable convolution accelerator. Electronics (2023)
Xuan, L., et al.: An FPGA-based energy-efficient reconfigurable depthwise separable convolution accelerator for image recognition. IEEE Trans. Circuits Syst. II Express Briefs 69(10), 4003–4007 (2020)
Wu, D., et al.: A high-performance CNN processor based on FPGA for MobileNets. In: 29th International Conference on Field Programmable Logic and Applications (FPL), pp. 136–143. Barcelona, Spain (2019)
Xiao, C.H., et al.: FGPA: fine-grained pipelined acceleration for depthwise separable CNN in resource constraint scenarios. IEEE (ISPA/BDCloud/SocialCom/SustainCom), pp. 246–254 (2021)
Chen, Y., Lou, J., Lanius, C., Freye, F., Loh, J., Gemmeke, T.:An energy-efficient and area-efficient depthwise separable convolution accelerator with minimal on-chip memory access. In: IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6. Dubai, United Arab Emirates (2023)
Kung, H.T.: Why Systolic Architectures? Computer, pp. 37–46 (1982)
Lin, Y., Zhang, Y., Yang, X.: A low memory requirement mobilenets accelerator based on FPGA for auxiliary medical tasks. In: Bioengineering, Basel (2022)
Fan, Z., Hu, W., Guo, H., Liu, F., Xu, D.: Hardware and algorithm co-optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In: 2021 IEEE International Conference on Systems. Man, and Cybernetics (SMC), pp. 3212–3217. Melbourne, Australia (2021)
Ou, J., Li, X., Sun, Y., Shi, Y.: A configurable hardware accelerator based on hybrid dataflow for depthwise separable convolution. In: 4th International Conference on Advances in Computer Technology. Information Science and Communications (CTISC), pp. 1–5. Suzhou, China (2022)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned Step Size Quantization. ArXiv (2019)
Chong, Y.S., et al.: An energy-efficient convolution unit for Depthwise separable convolutional neural networks. In: IEEE International Symposium on Circuits and System (ISCAS), pp. 1–5 (2021)
Chen, W., Wang, Z., Li, S., Yu, Z., Li, H.: Accelerating compact convolutional neural networks with multi-threaded data streaming. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 519–522 (2019)
Shao, Z., et al.: Memory-efficient CNN accelerator based on interlayer feature map compression. IEEE Trans. Circ. Syst. I: Regular Pap. 668–681 (2021)
Hsiao, S., Tsai, B.: Efficient computation of Depthwise separable convolution in MoblieNet deep neural network models. In: 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2 (2021)
Stillmaker, A., Baas, B.M.: Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm. Integration 58, 74–81 (2017)
Latotzke, C., Gemmeke, T.: Efficiency versus accuracy: a review of design techniques for DNN hardware accelerators. IEEE Access 9, 9785–9799 (2021)
Acknowledgments
This work is partially funded by the Federal Ministry of Education and Research (BMBF, Germany) under the project NEUROTEC II (project number 16ME0399) and NeuroSys (project number 03ZU1106CA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Chen, Y., Lou, J., Lanius, C., Freye, F., Loh, J., Gemmeke, T. (2024). A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-70947-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70946-3
Online ISBN: 978-3-031-70947-0
eBook Packages: Computer ScienceComputer Science (R0)