The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator

Liu, Hui-Wen; Shen, Chung-An

doi:10.1007/s00034-023-02331-4

The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator

Published: 06 March 2023

Volume 42, pages 4759–4783, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

238 Accesses
1 Altmetric
Explore all metrics

Abstract

This paper presents a highly configurable and low-complexity CNN accelerator based on the MobileNetV3 model. To the best of authors’ knowledge, this is the first design of CNN accelerator based on the MobileNetV3 model. A highly efficient processing flow and memory-access scheme are proposed in this paper so that the throughput is greatly enhanced for the structural features in MobileNetV3 model. Furthermore, the proposed processing flow enhances the efficiency for the utilization of hardware components to reduce the complexity. Based on the proposed processing flow, this paper presents a highly configurable architecture to support various operation modes in MobileNetV3 model. The designed architecture is synthesized and layout with TSMC 90 nm technology. The evaluations for the performance and area complexity are conducted based on the post-layout estimations. It is shown in this paper that the performance of 197.7 FPS is achieved with the hardware complexity of 5392 KGEs for the MobileNetV3-Large. Compared to the state-of-the-art accelerator based on MobileNet, the FPS of the proposed design is improved by 3.4 × and the complexity is reduced by 18%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

Article 17 January 2022

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

Article 18 October 2022

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Article 02 February 2022

Data Availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

A. Ardakani, C. Condo, M. Ahmadi, W.J. Gross, An Architecture to accelerate convolution in deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 65(4), 1349–1362 (2018)
Article Google Scholar
L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs 65(10), 1415–1419 (2018)
Google Scholar
W. Chen, Z. Wang, S. Li, Z. Yu and H. Li, Accelerating compact convolutional neural networks with multi-threaded data streaming. 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), (2019) pp. 519–522
Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Article Google Scholar
Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)
Article Google Scholar
J. Cheng, J. Wu, C. Leng, Y. Wang, Q. Hu, Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4730–4743 (2018)
Article Google Scholar
K.T. Chitty- Venkata, A.K. Somani, Neural architecture search survey: a hardware perspective. ACM Comput. Surv. 55(4), 1–36 (2022)
Article Google Scholar
K. Choi, G.E. Sobelman, An efficient CNN accelerator for low-cost edge systems. ACM Trans. Embed. Comput. Syst. 21(4), 1–20 (2022)
Article Google Scholar
G. Desoli et al., 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 238–239 (2017)
W. Ding, Z. Huang, Z. Huang, L. Tian, H. Wang, S. Feng, Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Architect. 97, 278–286 (2019)
Article Google Scholar
X. Feng, Y. Li, Y. Qian, J. Gao, W. Cao and L. Wang, A High-precision flexible symmetry-aware architecture for element-wise activation functions. 2021 International Conference on Field-Programmable Technology (ICFPT), pp.1–4 (2021)
A. Howard et al., Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.1314–1324 (2019)
A. W. Howard et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: Computer Vision and Pattern Recognition, (2017)
F. N. Iandola et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360, (2016)
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
J. Lee, C. Kim, S.-H. Kang, D. Shin, S.-Y. Kim, H.-J. Yoo, UNPU: an energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J. Solid-State Circuits 54(1), 173–185 (2019)
Article Google Scholar
J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2018)
Google Scholar
H.-J. Lin, C.-A. Shen, The data flow and architectural optimizations for a highly efficient cnn accelerator based on the depthwise separable convolution. Circuits Syst. Signal Process. 41(6), 3547–3569 (2022)
Article Google Scholar
K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis and M. Horowitz, Towards energy-proportional datacenter memory with mobile DRAM. 2012 39th Annual International Symposium on Computer Architecture (ISCA), pp. 37–48 (2012)
E. Park, D. Kim, and S. Yoo, Energy-efficient neural network accelerator based on outlier-aware low-precision computation. International Symposium on Computer Architecture, (2018)
P. Pelliccione et al., Automotive architecture framework: the experience of Volvo Cars. J. Syst. Architect. 77, 83–100 (2017)
Article Google Scholar
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. -C. Chen, MobileNetV2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.4510–4520 (2018)
Y. Yang, H. Luo, H. Xu, F. Wu, Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17(7), 2022–2031 (2016)
Article Google Scholar
X. Zhang, X. Zhou, M. Lin and J. Sun, ShuffleNet: An extremely efficient convolutional neural network for mobile devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6848–6856 (2018)

Download references

Funding

This study is funded by the Ministry of Science and Technology, Taiwan, MOST 110-2221-E-011-155, Chung-An Shen, MOST 111-2221-E-011-136-MY3, Chung-An Shen

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
Hui-Wen Liu & Chung-An Shen

Authors

Hui-Wen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chung-An Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chung-An Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, HW., Shen, CA. The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator. Circuits Syst Signal Process 42, 4759–4783 (2023). https://doi.org/10.1007/s00034-023-02331-4

Download citation

Received: 11 February 2022
Revised: 17 February 2023
Accepted: 18 February 2023
Published: 06 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00034-023-02331-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator

Abstract

Access this article

Similar content being viewed by others

The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator

Abstract

Access this article

Similar content being viewed by others

The Data Flow and Architectural Optimizations for a Highly Efficient CNN Accelerator Based on the Depthwise Separable Convolution

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

MCPS: a mapping method for MAERI accelerator base on Cartesian Product based Convolution for DNN layers with sparse input feature map

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation