Abstract
With the system performance, volume and power restriction requirements in edge computing, single chip based on Field Programmable Gate Array (FPGA), with the characteristics of parallel execution, flexible configuration and power efficiency, is more desirable for realizing Convolutional Neural Network (CNN) acceleration. However, implementing a lightweight CNN with limited on-chip resources while maintaining high computing efficiency and utilization is still a challenging task. To achieve efficient acceleration with single chip, we implement Network-on-Chip (NoC) based on Processing Element (PE) that consists of multiple node arrays. Moreover, the computing and memory efficiencies of PE are optimized with a sharing function and hybrid memory. To maximize resource utilization, a theoretical model is constructed to explore the parallel parameters and running cycles of each PE. In the experimental results of LeNet and MobileNet, resource utilization values of 83.61% and 95.28% are achieved, where the throughput values are 53.3 Giga Operations Per Second (GOPS) and 41.9 GOPS, respectively. Power measurements show that the power efficiency is optimized to 77.25 GOPS/W and 85.51 GOPS/W on our platform, which is sufficient to realize efficient inference for edge computing.










Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel MA, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. pp 1–74. Springer
Bai L, Zhao Y, Huang X (2018) A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans Circ Syst II: Express Briefs 65(10):1415–1419
Bianchi V, Bassoli M, Lombardo G, Fornacciari P, Mordonini M, De Munari I (2019) Iot wearable sensor and deep learning: an integrated approach for personalized human activity recognition in a smart home environment. IEEE Internet Things J 6(5):8553–8562
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1251–1258
Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: International conference on artificial neural networks, pp 281–290. Springer
Ding W, Huang Z, Huang ZA, Tian LA, Wang HA, Feng SA (2019) Designing efficient accelerator of depthwise separable convolutional neural network on fpga. J Syst Archit 97:278– 286
Gilan AA, Emad M, Alizadeh B (2019) Fpga-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans Circ Syst II: Express Briefs 67(4):755–759
Huang W, Wu H, Chen Q, Luo C, Huang Y (2021) Fpga-based high-throughput cnn hardware accelerator with high computing resource utilization ratio. IEEE Trans Neural Netw Learn Syst PP (99):1–15
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (ICML), pp 448–456. PMLR
Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 2704– 2713
Jafari A, Ganesan A, Thalisetty CSK, Sivasubramanian V, Oates T, Mohsenin T (2018) Sensornet: a scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans Circ Syst I: Regular Papers 66(1):274– 287
Kala S, Jose BR, Mathew J, Nalesh S (2019) High-performance cnn accelerator on fpga using unified winograd-gemm architecture. IEEE Trans Very Large Scale Integration (VLSI) Syst 27(12):2816–2828
Liao S, Samiee A, Deng C, Bai Y, Yuan B (2019) Compressing deep neural networks using toeplitz matrix: algorithm design and fpga implementation. In: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 1443–1447. IEEE
Lin M, Ji R, Wang Y, Zhang Y, Zhang B, Tian Y, Shao L (2020) Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1529–1538
Liu X, Yang J, Zou C, Chen Q, Yan X, Chen Y, Cai C (2021) Collaborative edge computing with fpga-based cnn accelerators for energy-efficient and time-aware face tracking system, pp 252–266. IEEE
Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized fpga accelerator for deep convolutional neural networks. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 10(3):1–23
Ma N, Zhang X, Zheng HT, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116– 131
Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Automatic compilation of diverse cnns onto high-performance fpga accelerators. IEEE Trans Comput-Aided Des Integrated Circ Syst 39(2):424– 437
Ma Y, Cao Y, Vrudhula S, Seo Js (2018) Optimizing the convolution operation to accelerate deep neural networks on fpga. IEEE Trans Very Large Scale Integration (VLSI) Syst 26(7):1354–1367
Ma Y, Cao Y, Vrudhula S, Seo JS (2019) Performance modeling for cnn inference accelerators on fpga. IEEE Trans Comput-Aided Des Integr Circ Syst 39(4):843–856
Mathieu M, Henaff M, LeCun Y (2014) Fast training of convolutional networks through ffts. In: 2nd International Conference on Learning Representations, ICLR 2014
Moolchandani D, Kumar A, Sarangi SR (2021) Accelerating cnn inference on asics: a survey. J Syst Archit 113:101887
Mukhopadhyay AK, Majumder S, Chakrabarti I (2022) Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array. Comput Electr Eng 97:107628
Palossi D, Conti F, Benini L (2019) An open source and open hardware deep learning-powered visual navigation engine for autonomous nano-uavs. In: 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS), pp 604–611. IEEE
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–4520
Wang J, Lin J, Wang Z (2017) Efficient hardware architectures for deep convolutional neural network. IEEE Trans Circ Syst I: Regular Papers 65(6):1941–1953
Wang S, Ananthanarayanan G, Zeng Y, Goel N, Pathania A, Mitra T (2019) High-throughput cnn inference on embedded arm big. little multicore processors. IEEE Trans Comput-Aided Des Integr Circ Syst 39(10):2254–2267
Yu Y, Wu C, Zhao T, Wang K, He L (2019) Opu: an fpga-based overlay processor for convolutional neural networks. IEEE Trans Very Large Scale Integr VLSI Syst 28(1):35–47
Zeng H, Chen R, Zhang C, Prasanna V (2018) A framework for generating high throughput cnn implementations on fpgas. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 117–126
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170
Zhang Y, Li X (2020) Fast convolutional neural networks with fine-grained ffts. In: Proceedings of the ACM international conference on parallel architectures and compilation techniques, pp 255–265
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China(NSFC) under Grant 62171156.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, R., Liu, B., Fu, P. et al. An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53, 13867–13881 (2023). https://doi.org/10.1007/s10489-022-04251-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-04251-3