Skip to main content
Log in

The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This paper presents a highly configurable and low-complexity CNN accelerator based on the MobileNetV3 model. To the best of authors’ knowledge, this is the first design of CNN accelerator based on the MobileNetV3 model. A highly efficient processing flow and memory-access scheme are proposed in this paper so that the throughput is greatly enhanced for the structural features in MobileNetV3 model. Furthermore, the proposed processing flow enhances the efficiency for the utilization of hardware components to reduce the complexity. Based on the proposed processing flow, this paper presents a highly configurable architecture to support various operation modes in MobileNetV3 model. The designed architecture is synthesized and layout with TSMC 90 nm technology. The evaluations for the performance and area complexity are conducted based on the post-layout estimations. It is shown in this paper that the performance of 197.7 FPS is achieved with the hardware complexity of 5392 KGEs for the MobileNetV3-Large. Compared to the state-of-the-art accelerator based on MobileNet, the FPS of the proposed design is improved by 3.4 × and the complexity is reduced by 18%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. A. Ardakani, C. Condo, M. Ahmadi, W.J. Gross, An Architecture to accelerate convolution in deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 65(4), 1349–1362 (2018)

    Article  Google Scholar 

  2. L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs 65(10), 1415–1419 (2018)

    Google Scholar 

  3. W. Chen, Z. Wang, S. Li, Z. Yu and H. Li, Accelerating compact convolutional neural networks with multi-threaded data streaming. 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), (2019) pp. 519–522

  4. Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)

    Article  Google Scholar 

  5. Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)

    Article  Google Scholar 

  6. J. Cheng, J. Wu, C. Leng, Y. Wang, Q. Hu, Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4730–4743 (2018)

    Article  Google Scholar 

  7. K.T. Chitty- Venkata, A.K. Somani, Neural architecture search survey: a hardware perspective. ACM Comput. Surv. 55(4), 1–36 (2022)

    Article  Google Scholar 

  8. K. Choi, G.E. Sobelman, An efficient CNN accelerator for low-cost edge systems. ACM Trans. Embed. Comput. Syst. 21(4), 1–20 (2022)

    Article  Google Scholar 

  9. G. Desoli et al., 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 238–239 (2017)

  10. W. Ding, Z. Huang, Z. Huang, L. Tian, H. Wang, S. Feng, Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Architect. 97, 278–286 (2019)

    Article  Google Scholar 

  11. X. Feng, Y. Li, Y. Qian, J. Gao, W. Cao and L. Wang, A High-precision flexible symmetry-aware architecture for element-wise activation functions. 2021 International Conference on Field-Programmable Technology (ICFPT), pp.1–4 (2021)

  12. A. Howard et al., Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.1314–1324 (2019)

  13. A. W. Howard et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: Computer Vision and Pattern Recognition, (2017)

  14. F. N. Iandola et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360, (2016)

  15. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  16. J. Lee, C. Kim, S.-H. Kang, D. Shin, S.-Y. Kim, H.-J. Yoo, UNPU: an energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J. Solid-State Circuits 54(1), 173–185 (2019)

    Article  Google Scholar 

  17. J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2018)

    Google Scholar 

  18. H.-J. Lin, C.-A. Shen, The data flow and architectural optimizations for a highly efficient cnn accelerator based on the depthwise separable convolution. Circuits Syst. Signal Process. 41(6), 3547–3569 (2022)

    Article  Google Scholar 

  19. K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis and M. Horowitz, Towards energy-proportional datacenter memory with mobile DRAM. 2012 39th Annual International Symposium on Computer Architecture (ISCA), pp. 37–48 (2012)

  20. E. Park, D. Kim, and S. Yoo, Energy-efficient neural network accelerator based on outlier-aware low-precision computation. International Symposium on Computer Architecture, (2018)

  21. P. Pelliccione et al., Automotive architecture framework: the experience of Volvo Cars. J. Syst. Architect. 77, 83–100 (2017)

    Article  Google Scholar 

  22. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. -C. Chen, MobileNetV2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.4510–4520 (2018)

  23. Y. Yang, H. Luo, H. Xu, F. Wu, Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17(7), 2022–2031 (2016)

    Article  Google Scholar 

  24. X. Zhang, X. Zhou, M. Lin and J. Sun, ShuffleNet: An extremely efficient convolutional neural network for mobile devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6848–6856 (2018)

Download references

Funding

This study is funded by the Ministry of Science and Technology, Taiwan, MOST 110-2221-E-011-155, Chung-An Shen, MOST 111-2221-E-011-136-MY3, Chung-An Shen

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chung-An Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, HW., Shen, CA. The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator. Circuits Syst Signal Process 42, 4759–4783 (2023). https://doi.org/10.1007/s00034-023-02331-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02331-4

Keywords

Navigation