Abstract
Field-Programmable Gate Arrays (FPGAs) are increasingly being explored for accelerating Convolutional Neural Networks (CNNs) due to their efficient energy consumption and robust performance. For low-power edge deployment, FPGA-based CNN accelerators typically adopt spatial unrolling architectures. These designs not only achieve high computational efficiency but also feature reduced latency between data transfer and storage access, with low power consumption. Nonetheless, these accelerators may not perform as well with convolutional layers that have large input sizes but few channels. The complexity involved in managing spatial unrolling can hinder their large-scale implementation in integrated circuits. To meet these challenges, this paper presents a new computing architecture called the Deformation Systolic Array (DSA). It starts by designing configurable processing elements (PEs). The architecture uses a designed feature pumping (F-P) method as its dataflow to minimize delays. Additionally, a data broadcasting approach is employed across PEs using a systolic array, enhancing data reuse. The scalable design allows adaptation to varying resource capacities and computational requirements. Furthermore, a scheduling policy has been developed that enables PEs to follow different parallel processing modes depending on the number of channels, size, and type of the convolutional layer. The evaluation experiments demonstrate that, compared to the NVIDIA RTX 3090 GPU and the SIYUAN370 ASIC, DSA-CNN achieves that speedups of 2.10 \(\times \) and 1.89 \(\times \) , respectively, when deploying the lightweight object detection network SSD-MobileNetV1-300 on the VU13P.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets used or analysed during the current study are available from the corresponding author on reasonable request.
Code availability
The code is available from the corresponding author on reasonable request.
References
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433
The Ho QN, Do TT, Minh PS, Nguyen VT, Nguyen VTT (2023) Turning chatter detection using a multi-input convolutional neural network via image and sound signal. Mach 11(6):644
Yuan T, Liu W, Han J, Lombardi F (2021) High performance cnn accelerators based on hardware and algorithm co-optimization. IEEE Trans Circ Syst I Regular Papers 68(1):250–263. https://doi.org/10.1109/TCSI.2020.3030663
Choquette J, Gandhi W, Giroux O, Stam N, Krashinsky R (2021) Nvidia a100 tensor core gpu: Performance and innovation. IEEE Micro 41(2):29–35
Choquette J, Gandhi W (2020) Nvidia a100 gpu: Performance & innovation for gpu computing. In: 2020 IEEE Hot Chips 32 Symposium (HCS), pp 1–43
Koppe G, Meyer-Lindenberg A, Durstewitz D (2021) Deep learning for small and big data in psychiatry. Neuropsychopharmacology 46(1):176–190
Yu Y, Zhao T, He L (2020) Light-opu: An fpga-based overlay processor for lightweight convolutional neural networks, pp 122–132
Chen X, Li J, Zhao Y (2021) Hardware resource and computational density efficient cnn accelerator design based on fpga. In: 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), pp 204–205. https://doi.org/10.1109/ICTA53157.2021.9661886
Li H, Gong L, Wang C, Zhou X (2023) A flexible dataflow cnn accelerator on fpga. In: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), pp 302–304. https://doi.org/10.1109/CCGridW59191.2023.00065
Nguyen D.T, Nguyen T.N, Kim H, Lee H.J (2019) A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Trans Very Large Scale Integ (VLSI) Syst 1–13
Zhang W, Qiao L, Hsu W, Cui Y, Jiang M, Luo G (2021) Fpga acceleration for 3-d low-dose tomographic reconstruction. IEEE Trans Comput-Aid Des Integ Circ Syst 40(4):666–679
Xia M, Huang Z, Tian L, Wang H, Feng S (2021) Sparknoc: An energy-efficiency fpga-based accelerator using optimized lightweight cnn for edge computing.J Syst Archit 115(4):101991
Liu D, Yang C, Li S, Chen X, Ren J, Liu R, Duan M, Tan Y, Liang L (2019) Fitcnn: A cloud-assisted and low-cost framework for updating cnns on iot devices. Futur Gener Comput Syst 91:277–289
Bai L, Zhao Y, Huang X () A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans Circ Syst II Express Briefs 65(10):1415–1419
Betz V, Rose J (2000) Automatic generation of fpga routing architectures from high-level descriptions. In: Proceedings of the 2000 ACM/SIGDA Eighth International Symposium on Field Programmable Gate Arrays. FPGA ’00, pp 175–184. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/329166.329203
Bing L, Zou D, Lei F, Shou F, Ping F (2019) An fpga-based cnn accelerator integrating depthwise separable convolution. Electr 8(3):281
Samajdar A, Joseph J.M, Zhu Y, Whatmough P, Mattina M, Krishna T (2020) A systematic methodology for characterizing scalability of dnn accelerators using scale-sim. In: 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 58–68
Gong L, Wang C, Li X, Chen H, Zhou X (2018) Maloc: A fully pipelined fpga accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans Comput-Aid Des Integr Circ Syst 37(11):2601–2612
Xu R, Ma S, Guo Y, Li D (2023) A survey of design and optimization for systolic array-based dnn accelerators. ACM Comput Surv 56(1)
Zhang J, Zhang W, Luo G, Wei X, Liang Y, Cong J (2019) Frequency improvement of systolic array-based cnns on fpgas. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–4. https://doi.org/10.1109/ISCAS.2019.8702071
Li B, Wang H, Zhang X, Ren J, Liu L, Sun H, Zheng N (2021) Dynamic dataflow scheduling and computation mapping techniques for efficient depthwise separable convolution acceleration. IEEE transactions on circuits and systems, I. Regular papers: a publication of the IEEE Circuits and Systems Society (8):68
Ding W, Huang Z, Huang Z.A, Tian L.A, Wang H.A, Feng SA (2019) Designing efficient accelerator of depthwise separable convolutional neural network on fpga.J Syst Archit 97:278–286
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255. Ieee
Kamal MS, Razzak SA, Hossain MM (2016) Catalytic oxidation of volatile organic compounds (vocs)-a review. Atmos Environ 140:117–134
Krizhevsky A, Sutskever I, Hinton G.E (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Howard AG (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324
Redmon J (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Naik BT, Hashmi MF (2023) Mobilenet+ ssd: Lightweight network for real-time detection of basketball player. In: Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences: PCCDS 2022, pp 11–19. Springer
Cai K, Miao X, Wang W, Pang H, Liu Y, Song J (2020) A modified yolov3 model for fish detection based on mobilenetv1 as backbone. Aquac Eng 91:102117
Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2019) Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aid Des Integr Circ Syst 38(11):2072–2085
Venieris SI, Bouganis CS (2019) fpgaconvnet: Mapping regular and irregular convolutional neural networks on fpgas. IEEE Trans Neural Netw Learn Syst 30(2):326–342
Chang JW, Kang SJ (2018) Optimizing fpga-based convolutional neural networks accelerator for image super-resolution. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp 343–348
Zhang J, Li J (2017) Improving the performance of opencl-based fpga accelerator for convolutional neural network. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA ’17, pp 25–34. Association for Computing Machinery, New York, NY, USA
Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo J.S, Cao Y (2016) Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Acm/sigda International Symposium, pp 16–25
Wu D, Zhang Y, Jia X, Tian L, Li T, Sui L, Xie D, Shan Y (2019) A high-performance cnn processor based on fpga for mobilenets. In: 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pp 136–143. IEEE
Su J, Faraone J, Liu J, Zhao Y, Thomas DB, Leong PH, Cheung PY (2018) Redundancy-reduced mobilenet acceleration on reconfigurable logic for imagenet classification. In: Applied Reconfigurable Computing. Architectures, Tools, and Applications: 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings 14, pp 16–28. Springer
Yu Y, Zhao T, Wang K, He L (2020) Light-opu: An fpga-based overlay processor for lightweight convolutional neural networks. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 122–132
Acknowledgements
This work is partially supported by the Special Key Project of Technological Innovation and Application Development of Chongqing (CSTB2022TIAD-KPX0057), the Natural Science Foundation Innovation and Development Joint Fund of Chongqing (CSTB2022NSCQ-LZX0074)
Author information
Authors and Affiliations
Contributions
Yi Wan: Writing-original draft, Writing-review editing, Investigation and Software. Junfan Chen: Software. Xiong Yang: Software. Hailong Zhang: Software. Chao Huang: Visualization and Data curation. Xianzhong Xie: Methodology and Conceptualization.
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
Not applicable
Consent to participate
Not applicable
Consent for publication
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, Y., Chen, J., Yang, X. et al. DSA-CNN: an fpga-integrated deformable systolic array for convolutional neural network acceleration. Appl Intell 55, 65 (2025). https://doi.org/10.1007/s10489-024-05898-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05898-w