Skip to main content
Log in

Performance-oriented FPGA-based convolution neural network designs

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Convolutional neural network (CNN) is the most well-known algorithm that it has been widely utilized in the applications of the image recognition and classification. Various Field Programmable Gate Array based (FPGA-based) CNN architectures had been proposed for the capability of the fast reconfigurability. However, the high-performance designs are necessary to reduce the computational time. The contributions of the paper include: 1) using heterogeneous and two-dimensional dispatcher technologies to implement FPGA-based CNN accelerators at different computational levels of CNN so that the computational time of CNN can be reduced and 2) proposing a flexible and integrated pipeline software and hardware (SW/HW) architecture to reduce the integration overheads of using a CNN framework. The experimental results show that the proposed architectures have the best performance and minimum FPGA resource requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Akira J, Fujii T, Sato S, Nakahara H (2018) An FPGA realization of OpenPose based on a sparse weight convolutional neural network. In: 2018 international conference on field- programmable technology (FPT). IEEE, pp 310-313

  2. Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCLTM deep learning accelerator on Arria 10. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 55–64

  3. Chakradhar S, Sankaradas M, Jakkula V, Cadambi S (2010) A dynamically configurable coprocessor for convolutional neural networks. In: 2010 37th international symposium oncomputer architecture (ISCA). ACM 247-257

  4. Chen T, Du Z, Sun N, Wang J, Wu C, Chn Y (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigarch Comput Architect News 42(1):269–284. https://doi.org/10.1145/2654822.2541967

    Article  Google Scholar 

  5. Chen YT, Cong J, Fang Z, Lei J, Wei P (2016) When spark meets FPGAs: a case study for next-generation DNA sequencing acceleration. In: 2016 8th Usenix workshop on hot topic in cloud computing (HotCloud). https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/chen. Accessed 11 June 2022

  6. Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: an energy efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid State Circuits 52(1):127–138. https://doi.org/10.1109/JSSC.2016.2616357

    Article  Google Scholar 

  7. Cong J, Xiao B (2014) Minimizing computation in convolutional neural networks. In: 2014 24th international conference on artificial neural networks (ICANN). Springer, pp 281-290

  8. Farabet C, Poulet C, Han JY, LeCun Y (2009) Cnp: an FPGA-based processor for convolutional networks. In: 2009 international conference on field programmable logic and applications (FPL). IEEE, pp 32-37

  9. Github repository (2022) https://github.com/tensorflow/tensorflow. Accessed 11 June 2022

  10. Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th international symposium on field-programmable custom computing machines (FCCM). IEEE, pp 152–159

  11. Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: 2015 international conference on machine learning (ICML). ACM, pp 1737-1746

  12. Han S, Mao H, Dally WJ (2016) Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. https://arxiv.org/abs/1510.00149. Accessed 11 June 2022

  13. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andeetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for Mobile vision applications. arXiv. https://arxiv.org/abs/1704.04861. Accessed 11 June 2022.

  14. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  15. Jouppi N, Young C, Patil N, Patterson D, Agrawal G, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 44th international symposium on computer architecture (ISCA). ACM, pp 1–12

  16. Krizhevsky A, Sutskever I, Hinton GE (2017) Image net classification with deep convolutional neural networks. ACM Commun 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  17. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  18. Li H, Fan X, Li J, Cao W, Zhou X, Wang L (2016) A High-Performance FPGA-based Accelerator for Large-scale Convolutional Neural Networks. In: 2016 International conference on field programmable logic and applications (FPL). IEEE, pp 1–9

  19. Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 international conference on field programmable logic and applications (FPL). IEEE, pp 1-9

  20. Lian X, Liu Z, Song Z, Dai J, Zhou W, Ji X (2019) High-performance FPGA-based CNN accelerator with block-floating-point arithmetic. IEEE Trans Very Large-Scale Integr (VLSI) Syst 27(8):1874–1885. https://doi.org/10.1109/TVLSI.2019.2913958

    Article  Google Scholar 

  21. Noronha D, Salehpour B, Wilton SJE (2018) LeFlow: enabling flexible FPGA high-level synthesis of TensorFlow deep neural networks. In: 5th international workshop on FPGA for software programmer (FSP). VDE-Verlag, pp 1–8

  22. Rosenberg C (2013) Improving photo search: a step across the semantic gap. Google AI blog. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html. Accessed 11 June 2022

  23. Shen Y, Ferdman M, Milder P (2017) Maximizing CNN accelerator efficiency through resource partitioning. In: 2017 44th international symposium on computer architecture (ISCA). ACM, 535–547

  24. Simonyan, K., Zisserman (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. https://arxiv.org/abs/1409.1556. Accessed 11 June 2022.

  25. Suda N, Chandra V, Dasika G, Mohanty A, Ma Y, Vrudhula S, Seo JS, Cao Y (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In: 2016 international symposium on field-programmable gate array (FPGA). ACM, pp 16–25

  26. Umuroglu Y, Fraser N, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable Binarized neural network inference. In: 2017 international symposium on field-programmable gate array (FPGA). ACM, pp 65–74

  27. Wei X, Liang Y, Li X, Yu CH, Zhang P, Cong J (2018) “TGPA: Tile-Grained Pipeline Architecture for Low latency CNN Inference,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8

  28. Wikipedia (2022) https://en.wikipedia.org/wiki/TensorFlow. Accessed 11 June 2022

  29. Xilinx (2012) Large FPGA methodology guide. Xilinx web. https://www.xilinx.com/support/documentation/sw_manuals/xilinx13_4/ug872_largefpga.pdf. Accessed 11 June 2022

  30. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator Design for Deep Convolutional Neural Networks. In: 2015 international symposium on field-programmable gate array (FPGA). ACM, pp 161-170

  31. Zhang C, Wu D, Sun J, Sun G, Luo G, Cong J (2016) Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster. In: 2016 International symposium on low power electronics and design (ISLPED). ACM, pp 326–331

Download references

Funding

This work was supported by Ministry of Science and Technology, Taiwan, MOST 110–2221-E-024-001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Chou Kao.

Ethics declarations

Conflict of interest

To the best of our knowledge, the named authors have no conflict of interest, financial or otherwise.

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kao, CC. Performance-oriented FPGA-based convolution neural network designs. Multimed Tools Appl 82, 21019–21030 (2023). https://doi.org/10.1007/s11042-023-14537-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14537-4

Keywords

Navigation