Skip to main content

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

  • Conference paper
  • First Online:
VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence (VLSI-SoC 2023)

Abstract

Lightweight convolutional neural networks (CNNs) reduce computational workloads, making them suitable for embedded devices with limited hardware resources compared to conventional CNNs. Depthwise separable convolution (DSC) serves as the fundamental convolution unit of lightweight CNNs. This paper introduces a hardware accelerator tailored for DSC in Application-Specific Integrated Circuit (ASIC), featuring a unified engine supporting both depthwise convolution (DWC) and pointwise convolution (PWC) with high hardware utilization. It ensures 100% processing element (PE) array utilization for DWC and achieves up to 98% utilization for PWC while minimizing latency. By partitioning the input feature map (ifmap) Static Random-Access Memory (SRAM) into three banks, memory access is streamlined. Furthermore, a data scheduling strategy, along with a multiplexed registers (MR) bank based First-In-First-Out (FIFO) system between adjacent PEs, is implemented to maximize data reuse and reduce latency. This work is implemented in a 22 nm FDSOI technology and validated on the CIFAR10 dataset using the MobileNetV1 architecture. The proposed DSC accelerator can operate at 1 GHz, exhibiting an energy efficiency of 5.07 (3.96) TOPS/W and an area efficiency of 519.2 (461.52) GOPS/mm\(^{2}\) for DWC (PWC) at 0.8 V. Scaling the supply voltage down to 0.5 V increases the energy efficiency to 13.64 TOPS/W for DWC and 10.64 TOPS/W for PWC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Szegedy, C., et al.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9. Boston, MA, USA (2015)

    Google Scholar 

  2. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807. Honolulu, HI, USA (2017)

    Google Scholar 

  3. Howard, AG., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017)

    Google Scholar 

  4. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520. Salt Lake City, UT, USA (2018)

    Google Scholar 

  5. Howard, A., et al.: Searching for MobileNetV3. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)

    Google Scholar 

  6. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6848-6856. Salt Lake City, UT, USA (2018)

    Google Scholar 

  7. Tan, M.X., Quoc, V.Le.: EfficientNet: rethinking model scaling for convolutional neural networks (2019)

    Google Scholar 

  8. Vaswani, A., et al.: Attention is all you need. In: 31st Conference on Neural Information Processing Systems, Association of Computational Machinery (2017)

    Google Scholar 

  9. Gulati, A., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Proceedings of Annual Conference on International Speech Communication Association, pp. 5036–5040 (2020)

    Google Scholar 

  10. Chen, Yu-Hsin., et al.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. (JSSC), 127–138 (2017)

    Google Scholar 

  11. Yue, J.S, et al.: A 3.77 TOPS/W convolutional neural network processor with priority-driven kernel optimization. IEEE Trans. Circ. Syst. II: Express Briefs, 277–281 (2019)

    Google Scholar 

  12. Chang, K.W., et al.: VWA: hardware efficient vectorwise accelerator for convolutional neural network. IEEE Trans. Circu. Syst. I: Regular Papers, 145–154 (2020)

    Google Scholar 

  13. Tu, F.B., et al.: Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2220–2233 (2017)

    Google Scholar 

  14. Anders, M.A., et al.: 2.9 TOPS/W reconfigurable dense/sparse matrix-multiply accelerator with unified INT8/INTI6/FP16 Datapath in 14NM Tri-Gate CMOS. In: IEEE Symposium on VLSI Circuits, pp. 39–40 (2018)

    Google Scholar 

  15. Kim, H., et al.: Row-streaming dataflow using a chaining buffer and systolic array+ structure. In: IEEE Computer Architecture Letters, pp. 34–37 (2021)

    Google Scholar 

  16. Wu, X., Ma, Y., Wang, Z.: Efficient inference of large-scale and lightweight convolutional neural networks on FPGA. In: 2020 IEEE 33rd International System-on-Chip Conference (SOCC), pp. 168–173. Las Vegas, NV, USA (2020)

    Google Scholar 

  17. An, F., et al.: A high performance reconfigurable hardware architecture for lightweight convolutional neural network. Electronics (2023)

    Google Scholar 

  18. Huang, J., Liu, X., Guo, T., Zhao, Z.: A high-performance FPGA-based depthwise separable convolution accelerator. Electronics (2023)

    Google Scholar 

  19. Xuan, L., et al.: An FPGA-based energy-efficient reconfigurable depthwise separable convolution accelerator for image recognition. IEEE Trans. Circuits Syst. II Express Briefs 69(10), 4003–4007 (2020)

    MATH  Google Scholar 

  20. Wu, D., et al.: A high-performance CNN processor based on FPGA for MobileNets. In: 29th International Conference on Field Programmable Logic and Applications (FPL), pp. 136–143. Barcelona, Spain (2019)

    Google Scholar 

  21. Xiao, C.H., et al.: FGPA: fine-grained pipelined acceleration for depthwise separable CNN in resource constraint scenarios. IEEE (ISPA/BDCloud/SocialCom/SustainCom), pp. 246–254 (2021)

    Google Scholar 

  22. Chen, Y., Lou, J., Lanius, C., Freye, F., Loh, J., Gemmeke, T.:An energy-efficient and area-efficient depthwise separable convolution accelerator with minimal on-chip memory access. In: IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), pp. 1–6. Dubai, United Arab Emirates (2023)

    Google Scholar 

  23. Kung, H.T.: Why Systolic Architectures? Computer, pp. 37–46 (1982)

    Google Scholar 

  24. Lin, Y., Zhang, Y., Yang, X.: A low memory requirement mobilenets accelerator based on FPGA for auxiliary medical tasks. In: Bioengineering, Basel (2022)

    Google Scholar 

  25. Fan, Z., Hu, W., Guo, H., Liu, F., Xu, D.: Hardware and algorithm co-optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In: 2021 IEEE International Conference on Systems. Man, and Cybernetics (SMC), pp. 3212–3217. Melbourne, Australia (2021)

    Google Scholar 

  26. Ou, J., Li, X., Sun, Y., Shi, Y.: A configurable hardware accelerator based on hybrid dataflow for depthwise separable convolution. In: 4th International Conference on Advances in Computer Technology. Information Science and Communications (CTISC), pp. 1–5. Suzhou, China (2022)

    Google Scholar 

  27. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned Step Size Quantization. ArXiv (2019)

    Google Scholar 

  28. Chong, Y.S., et al.: An energy-efficient convolution unit for Depthwise separable convolutional neural networks. In: IEEE International Symposium on Circuits and System (ISCAS), pp. 1–5 (2021)

    Google Scholar 

  29. Chen, W., Wang, Z., Li, S., Yu, Z., Li, H.: Accelerating compact convolutional neural networks with multi-threaded data streaming. In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 519–522 (2019)

    Google Scholar 

  30. Shao, Z., et al.: Memory-efficient CNN accelerator based on interlayer feature map compression. IEEE Trans. Circ. Syst. I: Regular Pap. 668–681 (2021)

    Google Scholar 

  31. Hsiao, S., Tsai, B.: Efficient computation of Depthwise separable convolution in MoblieNet deep neural network models. In: 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2 (2021)

    Google Scholar 

  32. Stillmaker, A., Baas, B.M.: Scaling equations for the accurate prediction of CMOS device performance from 180 nm to 7 nm. Integration 58, 74–81 (2017)

    Article  Google Scholar 

  33. Latotzke, C., Gemmeke, T.: Efficiency versus accuracy: a review of design techniques for DNN hardware accelerators. IEEE Access 9, 9785–9799 (2021)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is partially funded by the Federal Ministry of Education and Research (BMBF, Germany) under the project NEUROTEC II (project number 16ME0399) and NeuroSys (project number 03ZU1106CA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Lou, J., Lanius, C., Freye, F., Loh, J., Gemmeke, T. (2024). A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70947-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70946-3

  • Online ISBN: 978-3-031-70947-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics