Skip to main content

Advertisement

DSA-CNN: an fpga-integrated deformable systolic array for convolutional neural network acceleration

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Field-Programmable Gate Arrays (FPGAs) are increasingly being explored for accelerating Convolutional Neural Networks (CNNs) due to their efficient energy consumption and robust performance. For low-power edge deployment, FPGA-based CNN accelerators typically adopt spatial unrolling architectures. These designs not only achieve high computational efficiency but also feature reduced latency between data transfer and storage access, with low power consumption. Nonetheless, these accelerators may not perform as well with convolutional layers that have large input sizes but few channels. The complexity involved in managing spatial unrolling can hinder their large-scale implementation in integrated circuits. To meet these challenges, this paper presents a new computing architecture called the Deformation Systolic Array (DSA). It starts by designing configurable processing elements (PEs). The architecture uses a designed feature pumping (F-P) method as its dataflow to minimize delays. Additionally, a data broadcasting approach is employed across PEs using a systolic array, enhancing data reuse. The scalable design allows adaptation to varying resource capacities and computational requirements. Furthermore, a scheduling policy has been developed that enables PEs to follow different parallel processing modes depending on the number of channels, size, and type of the convolutional layer. The evaluation experiments demonstrate that, compared to the NVIDIA RTX 3090 GPU and the SIYUAN370 ASIC, DSA-CNN achieves that speedups of 2.10 \(\times \) and 1.89 \(\times \) , respectively, when deploying the lightweight object detection network SSD-MobileNetV1-300 on the VU13P.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The datasets used or analysed during the current study are available from the corresponding author on reasonable request.

Code availability

The code is available from the corresponding author on reasonable request.

References

  1. Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870

    Article  MathSciNet  MATH  Google Scholar 

  2. Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247

    Article  Google Scholar 

  3. Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433

    Article  Google Scholar 

  4. The Ho QN, Do TT, Minh PS, Nguyen VT, Nguyen VTT (2023) Turning chatter detection using a multi-input convolutional neural network via image and sound signal. Mach 11(6):644

    Article  MATH  Google Scholar 

  5. Yuan T, Liu W, Han J, Lombardi F (2021) High performance cnn accelerators based on hardware and algorithm co-optimization. IEEE Trans Circ Syst I Regular Papers 68(1):250–263. https://doi.org/10.1109/TCSI.2020.3030663

    Article  MathSciNet  MATH  Google Scholar 

  6. Choquette J, Gandhi W, Giroux O, Stam N, Krashinsky R (2021) Nvidia a100 tensor core gpu: Performance and innovation. IEEE Micro 41(2):29–35

    Article  Google Scholar 

  7. Choquette J, Gandhi W (2020) Nvidia a100 gpu: Performance & innovation for gpu computing. In: 2020 IEEE Hot Chips 32 Symposium (HCS), pp 1–43

  8. Koppe G, Meyer-Lindenberg A, Durstewitz D (2021) Deep learning for small and big data in psychiatry. Neuropsychopharmacology 46(1):176–190

    Article  MATH  Google Scholar 

  9. Yu Y, Zhao T, He L (2020) Light-opu: An fpga-based overlay processor for lightweight convolutional neural networks, pp 122–132

  10. Chen X, Li J, Zhao Y (2021) Hardware resource and computational density efficient cnn accelerator design based on fpga. In: 2021 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), pp 204–205. https://doi.org/10.1109/ICTA53157.2021.9661886

  11. Li H, Gong L, Wang C, Zhou X (2023) A flexible dataflow cnn accelerator on fpga. In: 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), pp 302–304. https://doi.org/10.1109/CCGridW59191.2023.00065

  12. Nguyen D.T, Nguyen T.N, Kim H, Lee H.J (2019) A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Trans Very Large Scale Integ (VLSI) Syst 1–13

  13. Zhang W, Qiao L, Hsu W, Cui Y, Jiang M, Luo G (2021) Fpga acceleration for 3-d low-dose tomographic reconstruction. IEEE Trans Comput-Aid Des Integ Circ Syst 40(4):666–679

    Article  MATH  Google Scholar 

  14. Xia M, Huang Z, Tian L, Wang H, Feng S (2021) Sparknoc: An energy-efficiency fpga-based accelerator using optimized lightweight cnn for edge computing.J Syst Archit 115(4):101991

  15. Liu D, Yang C, Li S, Chen X, Ren J, Liu R, Duan M, Tan Y, Liang L (2019) Fitcnn: A cloud-assisted and low-cost framework for updating cnns on iot devices. Futur Gener Comput Syst 91:277–289

    Article  MATH  Google Scholar 

  16. Bai L, Zhao Y, Huang X () A cnn accelerator on fpga using depthwise separable convolution. IEEE Trans Circ Syst II Express Briefs 65(10):1415–1419

  17. Betz V, Rose J (2000) Automatic generation of fpga routing architectures from high-level descriptions. In: Proceedings of the 2000 ACM/SIGDA Eighth International Symposium on Field Programmable Gate Arrays. FPGA ’00, pp 175–184. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/329166.329203

  18. Bing L, Zou D, Lei F, Shou F, Ping F (2019) An fpga-based cnn accelerator integrating depthwise separable convolution. Electr 8(3):281

    Google Scholar 

  19. Samajdar A, Joseph J.M, Zhu Y, Whatmough P, Mattina M, Krishna T (2020) A systematic methodology for characterizing scalability of dnn accelerators using scale-sim. In: 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 58–68

  20. Gong L, Wang C, Li X, Chen H, Zhou X (2018) Maloc: A fully pipelined fpga accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans Comput-Aid Des Integr Circ Syst 37(11):2601–2612

    Article  MATH  Google Scholar 

  21. Xu R, Ma S, Guo Y, Li D (2023) A survey of design and optimization for systolic array-based dnn accelerators. ACM Comput Surv 56(1)

  22. Zhang J, Zhang W, Luo G, Wei X, Liang Y, Cong J (2019) Frequency improvement of systolic array-based cnns on fpgas. In: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp 1–4. https://doi.org/10.1109/ISCAS.2019.8702071

  23. Li B, Wang H, Zhang X, Ren J, Liu L, Sun H, Zheng N (2021) Dynamic dataflow scheduling and computation mapping techniques for efficient depthwise separable convolution acceleration. IEEE transactions on circuits and systems, I. Regular papers: a publication of the IEEE Circuits and Systems Society (8):68

  24. Ding W, Huang Z, Huang Z.A, Tian L.A, Wang H.A, Feng SA (2019) Designing efficient accelerator of depthwise separable convolutional neural network on fpga.J Syst Archit 97:278–286

  25. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255. Ieee

  26. Kamal MS, Razzak SA, Hossain MM (2016) Catalytic oxidation of volatile organic compounds (vocs)-a review. Atmos Environ 140:117–134

    Article  MATH  Google Scholar 

  27. Krizhevsky A, Sutskever I, Hinton G.E (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  29. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  30. Howard AG (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  31. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1314–1324

  32. Redmon J (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  33. Naik BT, Hashmi MF (2023) Mobilenet+ ssd: Lightweight network for real-time detection of basketball player. In: Proceedings of the International Conference on Paradigms of Computing, Communication and Data Sciences: PCCDS 2022, pp 11–19. Springer

  34. Cai K, Miao X, Wang W, Pang H, Liu Y, Song J (2020) A modified yolov3 model for fish detection based on mobilenetv1 as backbone. Aquac Eng 91:102117

    Article  Google Scholar 

  35. Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2019) Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aid Des Integr Circ Syst 38(11):2072–2085

  36. Venieris SI, Bouganis CS (2019) fpgaconvnet: Mapping regular and irregular convolutional neural networks on fpgas. IEEE Trans Neural Netw Learn Syst 30(2):326–342

    Article  MATH  Google Scholar 

  37. Chang JW, Kang SJ (2018) Optimizing fpga-based convolutional neural networks accelerator for image super-resolution. In: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), pp 343–348

  38. Zhang J, Li J (2017) Improving the performance of opencl-based fpga accelerator for convolutional neural network. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. FPGA ’17, pp 25–34. Association for Computing Machinery, New York, NY, USA

  39. Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo J.S, Cao Y (2016) Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In: Acm/sigda International Symposium, pp 16–25

  40. Wu D, Zhang Y, Jia X, Tian L, Li T, Sui L, Xie D, Shan Y (2019) A high-performance cnn processor based on fpga for mobilenets. In: 2019 29th International Conference on Field Programmable Logic and Applications (FPL), pp 136–143. IEEE

  41. Su J, Faraone J, Liu J, Zhao Y, Thomas DB, Leong PH, Cheung PY (2018) Redundancy-reduced mobilenet acceleration on reconfigurable logic for imagenet classification. In: Applied Reconfigurable Computing. Architectures, Tools, and Applications: 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings 14, pp 16–28. Springer

  42. Yu Y, Zhao T, Wang K, He L (2020) Light-opu: An fpga-based overlay processor for lightweight convolutional neural networks. In: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 122–132

Download references

Acknowledgements

This work is partially supported by the Special Key Project of Technological Innovation and Application Development of Chongqing (CSTB2022TIAD-KPX0057), the Natural Science Foundation Innovation and Development Joint Fund of Chongqing (CSTB2022NSCQ-LZX0074)

Author information

Authors and Affiliations

Authors

Contributions

Yi Wan: Writing-original draft, Writing-review editing, Investigation and Software. Junfan Chen: Software. Xiong Yang: Software. Hailong Zhang: Software. Chao Huang: Visualization and Data curation. Xianzhong Xie: Methodology and Conceptualization.

Corresponding author

Correspondence to Xianzhong Xie.

Ethics declarations

Conflict of interest/Competing interests

Not applicable

Consent to participate

Not applicable

Consent for publication

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, Y., Chen, J., Yang, X. et al. DSA-CNN: an fpga-integrated deformable systolic array for convolutional neural network acceleration. Appl Intell 55, 65 (2025). https://doi.org/10.1007/s10489-024-05898-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05898-w

Keywords