An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

Wu, Ruidong; Liu, Bing; Fu, Ping; Chen, Haolin

doi:10.1007/s10489-022-04251-3

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

Published: 18 October 2022

Volume 53, pages 13867–13881, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ruidong Wu¹,
Bing Liu ORCID: orcid.org/0000-0001-6831-5779¹,
Ping Fu¹ &
…
Haolin Chen¹

1467 Accesses
1 Altmetric
Explore all metrics

Abstract

With the system performance, volume and power restriction requirements in edge computing, single chip based on Field Programmable Gate Array (FPGA), with the characteristics of parallel execution, flexible configuration and power efficiency, is more desirable for realizing Convolutional Neural Network (CNN) acceleration. However, implementing a lightweight CNN with limited on-chip resources while maintaining high computing efficiency and utilization is still a challenging task. To achieve efficient acceleration with single chip, we implement Network-on-Chip (NoC) based on Processing Element (PE) that consists of multiple node arrays. Moreover, the computing and memory efficiencies of PE are optimized with a sharing function and hybrid memory. To maximize resource utilization, a theoretical model is constructed to explore the parallel parameters and running cycles of each PE. In the experimental results of LeNet and MobileNet, resource utilization values of 83.61% and 95.28% are achieved, where the throughput values are 53.3 Giga Operations Per Second (GOPS) and 41.9 GOPS, respectively. Power measurements show that the power efficiency is optimized to 77.25 GOPS/W and 85.51 GOPS/W on our platform, which is sufficient to realize efficient inference for edge computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resource-constrained FPGA implementation of YOLOv2

Article Open access 29 May 2022

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. pp 1–74. Springer
Bai L, Zhao Y, Huang X (2018) A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans Circ Syst II: Express Briefs 65(10):1415–1419
Google Scholar
Bianchi V, Bassoli M, Lombardo G, Fornacciari P, Mordonini M, De Munari I (2019) Iot wearable sensor and deep learning: an integrated approach for personalized human activity recognition in a smart home environment. IEEE Internet Things J 6(5):8553–8562
Article Google Scholar
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1251–1258
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: International conference on artificial neural networks, pp 281–290. Springer
Ding W, Huang Z, Huang ZA, Tian LA, Wang HA, Feng SA (2019) Designing efficient accelerator of depthwise separable convolutional neural network on fpga. J Syst Archit 97:278– 286
Article Google Scholar
Gilan AA, Emad M, Alizadeh B (2019) Fpga-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans Circ Syst II: Express Briefs 67(4):755–759
Google Scholar
Huang W, Wu H, Chen Q, Luo C, Huang Y (2021) Fpga-based high-throughput cnn hardware accelerator with high computing resource utilization ratio. IEEE Trans Neural Netw Learn Syst PP (99):1–15
Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456. PMLR
Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2704– 2713
Jafari A, Ganesan A, Thalisetty CSK, Sivasubramanian V, Oates T, Mohsenin T (2018) Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans Circ Syst I: Regular Papers 66(1):274– 287
Google Scholar
Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Integration (VLSI) Syst 27(12):2816–2828
Article Google Scholar
Liao S, Samiee A, Deng C, Bai Y, Yuan B (2019) Compressing deep neural networks using toeplitz matrix: algorithm design and fpga implementation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1443–1447. IEEE
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1529–1538
Liu X, Yang J, Zou C, Chen Q, Yan X, Chen Y, Cai C (2021) Collaborative edge computing with fpga-based cnn accelerators for energy-efficient and time-aware face tracking system, pp 252–266. IEEE
Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10(3):1–23
Article Google Scholar
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116– 131
Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Automatic compilation of diverse cnns onto high-performance fpga accelerators. IEEE Trans Comput-Aided Des Integrated Circ Syst 39(2):424– 437
Article Google Scholar
Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Optimizing the convolution operation to accelerate deep neural networks on fpga. IEEE Trans Very Large Scale Integration (VLSI) Syst 26(7):1354–1367
Article Google Scholar
Ma Y, Cao Y, Vrudhula S, Seo JS (2019) Performance modeling for cnn inference accelerators on fpga. IEEE Trans Comput-Aided Des Integr Circ Syst 39(4):843–856
Article Google Scholar
Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through ffts. In: 2nd International Conference on Learning Representations, ICLR 2014
Moolchandani D, Kumar A, Sarangi SR (2021) Accelerating cnn inference on asics: a survey. J Syst Archit 113:101887
Article Google Scholar
Mukhopadhyay AK, Majumder S, Chakrabarti I (2022) Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array. Comput Electr Eng 97:107628
Article Google Scholar
Palossi D, Conti F, Benini L (2019) An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-uavs. In: 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), pp 604–611. IEEE
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
Wang J, Lin J, Wang Z (2017) Efficient hardware architectures for deep convolutional neural network. IEEE Trans Circ Syst I: Regular Papers 65(6):1941–1953
Google Scholar
Wang S, Ananthanarayanan G, Zeng Y, Goel N, Pathania A, Mitra T (2019) High-throughput cnn inference on embedded arm big. little multicore processors. IEEE Trans Comput-Aided Des Integr Circ Syst 39(10):2254–2267
Article Google Scholar
Yu Y, Wu C, Zhao T, Wang K, He L (2019) Opu: an fpga-based overlay processor for convolutional neural networks. IEEE Trans Very Large Scale Integr VLSI Syst 28(1):35–47
Article Google Scholar
Zeng H, Chen R, Zhang C, Prasanna V (2018) A framework for generating high throughput cnn implementations on fpgas. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 117–126
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170
Zhang Y, Li X (2020) Fast convolutional neural networks with fine-grained ffts. In: Proceedings of the ACM international conference on parallel architectures and compilation techniques, pp 255–265

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China(NSFC) under Grant 62171156.

Author information

Authors and Affiliations

School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, 150001, China
Ruidong Wu, Bing Liu, Ping Fu & Haolin Chen

Authors

Ruidong Wu
View author publications
You can also search for this author inPubMed Google Scholar
Bing Liu
View author publications
You can also search for this author inPubMed Google Scholar
Ping Fu
View author publications
You can also search for this author inPubMed Google Scholar
Haolin Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bing Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, R., Liu, B., Fu, P. et al. An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53, 13867–13881 (2023). https://doi.org/10.1007/s10489-022-04251-3

Download citation

Accepted: 07 October 2022
Published: 18 October 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04251-3

Keywords

Part of a collection:

Computer Science SDG 7: Affordable and Clean Energy

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Resource-constrained FPGA implementation of YOLOv2

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

A Reconfigurable Convolutional Neural Networks Accelerator Based on FPGA

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now