Abstract
This paper presents a highly configurable and low-complexity CNN accelerator based on the MobileNetV3 model. To the best of authors’ knowledge, this is the first design of CNN accelerator based on the MobileNetV3 model. A highly efficient processing flow and memory-access scheme are proposed in this paper so that the throughput is greatly enhanced for the structural features in MobileNetV3 model. Furthermore, the proposed processing flow enhances the efficiency for the utilization of hardware components to reduce the complexity. Based on the proposed processing flow, this paper presents a highly configurable architecture to support various operation modes in MobileNetV3 model. The designed architecture is synthesized and layout with TSMC 90 nm technology. The evaluations for the performance and area complexity are conducted based on the post-layout estimations. It is shown in this paper that the performance of 197.7 FPS is achieved with the hardware complexity of 5392 KGEs for the MobileNetV3-Large. Compared to the state-of-the-art accelerator based on MobileNet, the FPS of the proposed design is improved by 3.4 × and the complexity is reduced by 18%.
Similar content being viewed by others
Data Availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
A. Ardakani, C. Condo, M. Ahmadi, W.J. Gross, An Architecture to accelerate convolution in deep neural networks. IEEE Trans. Circuits Syst. I Regul. Pap. 65(4), 1349–1362 (2018)
L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II Express Briefs 65(10), 1415–1419 (2018)
W. Chen, Z. Wang, S. Li, Z. Yu and H. Li, Accelerating compact convolutional neural networks with multi-threaded data streaming. 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), (2019) pp. 519–522
Y.-H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(2), 292–308 (2019)
J. Cheng, J. Wu, C. Leng, Y. Wang, Q. Hu, Quantized CNN: a unified approach to accelerate and compress convolutional networks. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4730–4743 (2018)
K.T. Chitty- Venkata, A.K. Somani, Neural architecture search survey: a hardware perspective. ACM Comput. Surv. 55(4), 1–36 (2022)
K. Choi, G.E. Sobelman, An efficient CNN accelerator for low-cost edge systems. ACM Trans. Embed. Comput. Syst. 21(4), 1–20 (2022)
G. Desoli et al., 14.1 A 2.9TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 238–239 (2017)
W. Ding, Z. Huang, Z. Huang, L. Tian, H. Wang, S. Feng, Designing efficient accelerator of depthwise separable convolutional neural network on FPGA. J. Syst. Architect. 97, 278–286 (2019)
X. Feng, Y. Li, Y. Qian, J. Gao, W. Cao and L. Wang, A High-precision flexible symmetry-aware architecture for element-wise activation functions. 2021 International Conference on Field-Programmable Technology (ICFPT), pp.1–4 (2021)
A. Howard et al., Searching for MobileNetV3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.1314–1324 (2019)
A. W. Howard et al., MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: Computer Vision and Pattern Recognition, (2017)
F. N. Iandola et al., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size. arXiv preprint arXiv:1602.07360, (2016)
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
J. Lee, C. Kim, S.-H. Kang, D. Shin, S.-Y. Kim, H.-J. Yoo, UNPU: an energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J. Solid-State Circuits 54(1), 173–185 (2019)
J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2018)
H.-J. Lin, C.-A. Shen, The data flow and architectural optimizations for a highly efficient cnn accelerator based on the depthwise separable convolution. Circuits Syst. Signal Process. 41(6), 3547–3569 (2022)
K. T. Malladi, F. A. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis and M. Horowitz, Towards energy-proportional datacenter memory with mobile DRAM. 2012 39th Annual International Symposium on Computer Architecture (ISCA), pp. 37–48 (2012)
E. Park, D. Kim, and S. Yoo, Energy-efficient neural network accelerator based on outlier-aware low-precision computation. International Symposium on Computer Architecture, (2018)
P. Pelliccione et al., Automotive architecture framework: the experience of Volvo Cars. J. Syst. Architect. 77, 83–100 (2017)
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov and L. -C. Chen, MobileNetV2: inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.4510–4520 (2018)
Y. Yang, H. Luo, H. Xu, F. Wu, Towards real-time traffic sign detection and classification. IEEE Trans. Intell. Transp. Syst. 17(7), 2022–2031 (2016)
X. Zhang, X. Zhou, M. Lin and J. Sun, ShuffleNet: An extremely efficient convolutional neural network for mobile devices. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.6848–6856 (2018)
Funding
This study is funded by the Ministry of Science and Technology, Taiwan, MOST 110-2221-E-011-155, Chung-An Shen, MOST 111-2221-E-011-136-MY3, Chung-An Shen
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, HW., Shen, CA. The Design of Efficient Data Flow and Low-Complexity Architecture for a Highly Configurable CNN Accelerator. Circuits Syst Signal Process 42, 4759–4783 (2023). https://doi.org/10.1007/s00034-023-02331-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-023-02331-4