Skip to main content

Advertisement

Log in

FPGA-based accelerator for object detection: a comprehensive survey

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Object detection is one of the most challenging tasks in computer vision. With the advances in semiconductor devices and chip technology, hardware accelerators have been widely used. Field-programmable gate arrays (FPGAs) are a highly flexible hardware platform that allows customized reconfiguration of the integrated circuit, which has the potential to improve the efficiency of object detection accelerators. However, few reviews summarize FPGA-based object detection accelerators. Also, there is no general principle for realizing object detection according to FPGA characteristics. In this paper, the current hardware accelerators are introduced and compared. Then, the typical deep learning-based object detectors are summarized. Next, the questions of “Why choose FPGA,” “The design goals of FPGA accelerators” and “The design methods for FPGA accelerators” are discussed in detail. Finally, the challenges of object detection algorithms, hardware, and co-design are presented. In addition, an online platform (https://github.com/vivian13maker/) is constructed to provide specific information on all advanced works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Cheng Z, Zhu X, Gong S (2020) Face re-identification challenge: Are face recognition models good enough? Pattern Recognit 107:107422

    Article  Google Scholar 

  2. Xu Y, Zhang Z, Lu G, Yang J (2016) Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification. Pattern Recognit 54:68–82

    Article  Google Scholar 

  3. Peng C, Wang N, Li J, Gao X (2019) Dlface: deep local descriptor for cross-modality face recognition. Pattern Recognit 90:161–171

    Article  Google Scholar 

  4. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 815–823

  5. Saeidi M, Ahmadi A (2020) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput 77:2033–2068

    Article  Google Scholar 

  6. Han SS, Kim YK, Jeon YB, Park J, Park DS, Hwang DH, Jeong CS (2020) Distributed deep learning platform for pedestrian detection on it convergence environment. J Supercomput 76:5460–5485

    Article  Google Scholar 

  7. Hua W, Mu D, Zheng Z, Guo D (2017) Online multi-person tracking assist by high-performance detection. J Supercomput 76:4076–4094

    Article  Google Scholar 

  8. Zaghari N, Fathy M, Jameii SM, Shahverdy M (2021) The improvement in obstacle detection in autonomous vehicles using yolo non-maximum suppression fuzzy algorithm. J Supercomput 55:1–26

    Google Scholar 

  9. Zaghari N, Fathy M, Jameii SM, Sabokrou M, Shahverdy M (2020) Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques

  10. Atahary T, Taha T, Douglass S (2020) Parallelized path-based search for constraint satisfaction in autonomous cognitive agents. J Supercomput 77:1667–1692

    Article  Google Scholar 

  11. Cho S, Cho K (2019) Real-time 3d reconstruction method using massive multi-sensor data analysis and fusion. J Supercomput 75:3229–3248

    Article  Google Scholar 

  12. Zhang W, Cho S, Chae J, Sung Y, Cho K (2018) Object tracking method based on data computing. J Supercomput 75:3217–3228

    Article  Google Scholar 

  13. Constantinescu DA, Navarro A, Corbera F, Fernández-Madrigal J, Asenjo R (2020) Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J Supercomput 77:44–65

    Article  Google Scholar 

  14. Hao X, Zhang G, Ma S (2016) Deep learning. Int J Semantic Comput 10:417

    Article  Google Scholar 

  15. Goodfellow I, Bengio Y, Courville AC (2015) Deep learning. Nature 521:436–444

    Article  MATH  Google Scholar 

  16. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition pp 3354–3361

  17. Kyrkou C, Theocharides T (2012) A parallel hardware architecture for real-time object detection with support vector machines. IEEE Trans Computers 61:831–842

    Article  MathSciNet  MATH  Google Scholar 

  18. Hsiao P, Lin SY, Huang SS (2015) An fpga based human detection system with embedded platform. Microelectron Eng 138:42–46

    Article  Google Scholar 

  19. Feng X, Jiang Y, Yang X, Du M, Li X (2019) Computer vision algorithms and hardware implementations: a survey. Integration 69:309–320

    Article  Google Scholar 

  20. B C, S O (2019) Hardware designs for histogram of oriented gradients in pedestrian detection: A survey. 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) pp 849–854

  21. Borrego-Carazo J, Castells-Rufas D, Biempica E, Carrabina J (2020) Resource-constrained machine learning for adas: a systematic review. IEEE Access 8:40573–40598

    Article  Google Scholar 

  22. Li T, Ma Y, Endoh T (2020) A systematic study of tiny yolo3 inference: toward compact brainware processor with less memory and logic gate. IEEE Access 8:142931–142955

    Article  Google Scholar 

  23. Talib M, Majzoub S, Nasir Q, Jamal D (2020) A systematic literature review on hardware implementation of artificial intelligence algorithms. J Supercomput 77:1897–1938

    Article  Google Scholar 

  24. Xiyuan P, Jinxiang Y, Bowen Y, Liansheng L, Peng Y (2021) A review of fpga-based custom computing architecture for convolutional neural network inference. Chinese J Electron 30:1–17

    Article  Google Scholar 

  25. Li Y, Wang S, Tian Q, Ding X (2015) Feature representation for statistical-learning-based object detection: a review. Pattern Recognit 48:3542–3559

    Article  Google Scholar 

  26. Zhiqiang W, Jun L (2017) A review of object detection based on convolutional neural network. 2017 36th Chinese Control Conference (CCC) pp 11104–11109

  27. Sharma K, Thakur NV (2017) A review and an approach for object detection in images. Int J Comput Vision Robot 7:196–237

    Article  Google Scholar 

  28. Tao Y, Ma R, Shyu M, Chen SC (2020) Challenges in energy-efficient deep neural network training with fpga. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 1602–1611

  29. Rodríguez-Andina J, Pena MDV, Moure MJ (2015) Advanced features and industrial applications of fpgas-a review. IEEE Trans Indus Inform 11:853–864

    Article  Google Scholar 

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9

  31. Shawahna A, Sait SM, El-Maleh A (2019) Fpga-based accelerators of deep learning networks for learning and classification: A review. IEEE Access 7:7823–7859

    Article  Google Scholar 

  32. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556

  33. Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Hock JOG, Liew YT, Srivatsan K, Moss DJM, Subhaschandra S, Boudoukh G (2017) Can fpgas beat gpus in accelerating next-generation deep neural networks? Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  34. Garland M, Grand SML, Nickolls J, Anderson J, Hardwick J, Morton S, Phillips EH, Zhang Y, Volkov V (2008) Parallel computing experiences with cuda. IEEE Micro 28:81

    Article  Google Scholar 

  35. Stone J, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12:66–73

    Article  Google Scholar 

  36. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia

  37. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS

  38. Ma Y, Yu D, Wu T, Wang H (2019) Paddlepaddle: An open-source deep learning platform from industrial practice

  39. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zhang X (2016) Tensorflow: A system for large-scale machine learning. In: OSDI

  40. Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2013) Neural acceleration for general-purpose approximate programs. IEEE Micro 33:16–27

    Article  Google Scholar 

  41. Wang Y, Wei GY, Brooks D (2019) Benchmarking tpu, gpu, and cpu platforms for deep learning. http://arxiv.org/abs/1907.10701

  42. Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B, Beakes M, Brezzo B, Kuang JB, Manohar R, Risk W, Jackson B, Modha D (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Computer-Aided Des Integ Circuits Syst 34:1537–1557

    Article  Google Scholar 

  43. Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems

  44. Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al. (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12

  45. Ebeling C, Cronquist DC, Franklin P (1997) Configurable computing: the catalyst for high-performance architectures. Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors pp 364–372

  46. Herbordt M, Gu Y, Court T, Model J, Sukhwani B, Chiu M (2008) Computing models for fpga-based accelerators. Comput Sci Eng 10:51

    Article  Google Scholar 

  47. Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: Past, present, and future. http://arxiv.org/abs/1602.04283

  48. Cong J, Fang Z, Lo M, Wang H, Xu J, Zhang S (2018) Understanding performance differences of fpgas and gpus. 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) pp 93–96

  49. Bajestani MF, Ghasemi M, Vrudhula S, Yang Y (2020) Enabling incremental knowledge transfer for object detection at the edge. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 1591–1599

  50. Dang V, Skadron K (2017) Acceleration of frequent itemset mining on fpga using sdaccel and vivado hls. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) pp 195–200

  51. Kathail V (2020) Xilinx vitis unified software platform. Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  52. Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645

    Article  Google Scholar 

  53. Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition pp 580–587

  54. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  55. Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2018) Spatial and semantic convolutional features for robust visual object tracking. Multim Tools Appl 79:15095–15115

    Article  Google Scholar 

  56. Hui Q (2019) Motion video tracking technology in sports training based on mean-shift algorithm. J Supercomput 75:6021–6037

    Article  Google Scholar 

  57. Ding P, Zhang J, Zhou H, Zou X, Wang M (2020) Pyramid context learning for object detection. J Supercomput 64:1–14

    Google Scholar 

  58. Taranto-Vera G, Galindo-Villardón P, Merchán-Sánchez-Jara J, Salazar-Pozo J, Moreno-Salazar A, Salazar-Villalva V (2021) Algorithms and software for data mining and machine learning: a critical comparative view from a systematic review of the literature. J Supercomput 23:1–33

    Google Scholar 

  59. Zhang D, Liang Z, Yang G, Li Q, Li L, Sun X (2017) A robust forgery detection algorithm for object removal by exemplar-based image inpainting. Multim Tools Appl 77:11823–11842

    Article  Google Scholar 

  60. Liang Z, Yang G, Ding X, Li L (2015) An efficient forgery detection algorithm for object removal by exemplar-based image inpainting. J Vis Commun Image Represent 30:75–85

    Article  Google Scholar 

  61. Shehab M, Al-Ayyoub M, Jararweh Y, Jarrah M (2016) Accelerating compute-intensive image segmentation algorithms using gpus. J Supercomput 73:1929–1951

    Article  Google Scholar 

  62. Li W, Ding S, Chen Y, Wang H, Yang S (2018) Transfer learning-based default prediction model for consumer credit in china. J Supercomput 75:862–884

    Article  Google Scholar 

  63. Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on

  64. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 1:886–893 vol. 1

  65. Girshick RB (2015) Fast r-cnn. 2015 IEEE International Conference on Computer Vision (ICCV) pp 1440–1448

  66. Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  67. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. http://arxiv.org/abs/1605.06409

  68. He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42:386–397

    Article  Google Scholar 

  69. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916

    Article  Google Scholar 

  70. Lin TY, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 936–944

  71. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 8759–8768

  72. Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. http://arxiv.org/abs/1905.11946

  73. Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q (2020) Feature pyramid transformer. http://arxiv.org/abs/2007.09451

  74. Qiao S, Chen LC, Yuille A (2021) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: CVPR

  75. Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. In: ECCV

  76. Cai L, Zhao B, Wang Z, Lin J, Foo CS, Aly M, Chandrasekhar V (2019) Maxpoolnms: Getting rid of nms bottlenecks in two-stage object detectors. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 9348–9356

  77. Zhang T, Lin J, Hu P, Zhao B, Aly M (2021) Psrr-maxpoolnms: Pyramid shifted maxpoolnms with relationship recovery. http://arxiv.org/abs/2105.12990

  78. Redmon J, Divvala S, Girshick RB, Farhadi A (2016) You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788

  79. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 6517–6525

  80. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. http://arxiv.org/abs/1804.02767

  81. Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection. http://arxiv.org/abs/2004.10934

  82. Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg A (2016) Ssd: Single shot multibox detector. In: ECCV

  83. Fu CY, Liu W, Ranga A, Tyagi A, Berg A (2017) Dssd : Deconvolutional single shot detector. http://arxiv.org/abs/1701.06659

  84. Zhou S, Qiu J (2021) Enhanced ssd with interactive multi-scale attention features for object detection. Multimedia Tools and Applications pp 1–18

  85. Lin TY, Goyal P, Girshick RB, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327

    Article  Google Scholar 

  86. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. http://arxiv.org/abs/2005.12872

  87. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. http://arxiv.org/abs/2103.14030

  88. Zheng M, Gao P, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. http://arxiv.org/abs/2011.09315

  89. Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: Deformable transformers for end-to-end object detection. http://arxiv.org/abs/2010.04159

  90. Dai Z, Cai B, Lin Y, Chen J (2020) Up-detr: Unsupervised pre-training for object detection with transformers.http://arxiv.org/abs/2011.09094

  91. Everingham M, Gool L, Williams CKI, Winn J, Zisserman A (2009) The pascal visual object classes (voc) challenge. Int J Computer Vision 88:303–338

    Article  Google Scholar 

  92. Lin TY, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: ECCV

  93. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: A survey. http://arxiv.org/abs/1905.05055

  94. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimedia Tools and Applications pp 1–63

  95. Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. http://arxiv.org/abs/2104.11892

  96. Fan H, Liu S, Ferianc M, Ng HC, Que Z, Liu S, Niu X, Luk W (2018) A real-time object detection accelerator with compressed ssdlite on fpga. 2018 International Conference on Field-Programmable Technology (FPT) pp 14–21

  97. Zhang S, Wen L, Bian X, Lei Z, Li S (2018) Single-shot refinement neural network for object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4203–4212

  98. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: AAAI

  99. Nguyen DT, Nguyen TN, Kim H, Lee HJ (2019) A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27:1861–1873

  100. Xu X, Liu B (2018) Fclnn: A flexible framework for fast cnn prototyping on fpga with opencl and caffe. 2018 International Conference on Field-Programmable Technology (FPT) pp 238–241

  101. Wang Z, Xu K, Wu S, Liu L, Liu L, Wang D (2020) Sparse-yolo: hardware/software co-design of an fpga accelerator for yolov2. IEEE Access 8:116569–116585

    Article  Google Scholar 

  102. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 6568–6577

  103. Kim S, Na S, Kong BY, Choi J, Park IC (2021) Real-time ssdlite object detection on fpga. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP(99):1–14

  104. Mani V, Saravanaselvan A, Arumugam N (2022) Performance comparison of cnn, qnn and bnn deep neural networks for real-time object detection using zynq fpga node. Microelectronics Journal 119:105319 https://doi.org/10.1016/j.mejo.2021.105319, https://www.sciencedirect.com/science/article/pii/S0026269221003001

  105. Bao C, Xie T, Feng W, Chang L, Yu C (2020) A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access 8:94307–94317

    Article  Google Scholar 

  106. Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) pp 101–108

  107. Ma Y, Cao Y, Vrudhula S, sun Seo J (2017) Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  108. Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. http://arxiv.org/abs/Learning

  109. Venieris SI, Bouganis C (2017) Latency-driven design for fpga-based convolutional neural networks. 2017 27th International Conference on Field Programmable Logic and Applications (FPL) pp 1–8

  110. Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. http://arxiv.org/abs/1606.06160

  111. Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer cnn accelerators. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) pp 1–12

  112. Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: A binarized cnn with a parallel support vector regression for an fpga. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  113. Nguyen DT, Kim H, Lee HJ, Chang I (2018) An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. 2018 IEEE International Symposium on Circuits and Systems (ISCAS) pp 1–5

  114. Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: ECCV

  115. Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  116. Courbariaux M, Bengio Y, David J (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. In: NIPS

  117. Aydonat U, O’Connell S, Capalija D, Ling A, Chiu G (2017) An opencl™ deep learning accelerator on arria 10. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  118. Wang D, Xu K, Jia Q, Ghiasi S (2019) Abm-spconv: A novel approach to fpga-based acceleration of convolutionai neurai network inference. 2019 56th ACM/IEEE Design Automation Conference (DAC) pp 1–6

  119. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV

  120. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 10778–10787

  121. Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. http://arxiv.org/abs/1710.05941

  122. Iandola FN, Moskewicz M, Ashraf K, Han S, Dally W, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<1mb model size. http://arxiv.org/abs/1602.07360

  123. Lin M, Chen Q, Yan S (2014) Network in network. http://arxiv.org/abs/1312.4400

  124. He Y, Peemen M, Waeijen L, Diken E, Fiumara M, Rauwerda G, Corporaal H, Geng T (2016) A configurable simd architecture with explicit datapath for intelligent learning. 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) pp 156–163

  125. Loan C (1992) Computational frameworks for the fast fourier transform

  126. Sakaguchi K, Bras RL, Bhagavatula C, Choi Y (2020) Winogrande: An adversarial winograd schema challenge at scale. http://arxiv.org/abs/1907.10641

  127. Winograd S (1980) Arithmetic complexity of computations

  128. Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

  129. He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: Automl for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–800

  130. Wu J, Zhang Y, Bai H, Zhong H, Hou J, Liu W, Huang W, Huang J (2018) Pocketflow: An automated framework for compressing and accelerating deep neural networks

  131. Yu Z, Bouganis C (2020) A parameterisable fpga-tailored architecture for yolov3-tiny. In: ARC

  132. Li S, Luo Y, Sun K, Yadav N, Choi K (2020) A novel fpga accelerator design for real-time and ultra-low power deep convolutional neural networks compared with titan x gpu. IEEE Access 8:105455–105471

    Article  Google Scholar 

  133. Zhang S, Cao J, Zhang Q, Zhang Q, Zhang Y, Wang Y (2020) An fpga-based reconfigurable cnn accelerator for yolo. 2020 IEEE 3rd International Conference on Electronics Technology (ICET) pp 74–78

  134. Ma Y, Cao Y, Vrudhula S, sun Seo J (2017) An automatic rtl compiler for high-throughput fpga implementation of diverse deep convolutional neural networks. 2017 27th International Conference on Field Programmable Logic and Applications (FPL) pp 1–8

  135. Zhu C, Huang K, Yang S, Zhu ZQ, Zhang H, Shen H (2020) An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28:1953–1965

  136. Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer cnn accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12, https://doi.org/10.1109/MICRO.2016.7783725

  137. Saidi A, Ben Othman S, Dhouibi M, Ben Saoud S (2021) Fpga-based implementation of classification techniques: A survey. Integration 81:280–299 https://doi.org/10.1016/j.vlsi.2021.08.004, https://www.sciencedirect.com/science/article/pii/S0167926021000894

  138. Zhao R, Ng HC, Luk W, Niu X (2018) Towards efficient convolutional neural network for domain-specific applications on fpga. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp 147–1477, https://doi.org/10.1109/FPL.2018.00033

  139. Seto K, Nejatollahi H, An J, Kang S, Dutt N (2019) Small memory footprint neural network accelerators. In: 20th International Symposium on Quality Electronic Design (ISQED), pp 253–258, https://doi.org/10.1109/ISQED.2019.8697641

  140. Yu F, Shen H, Zhang Z, Huang Y, Cai S, Du S (2021) A new multi-scroll chua’s circuit with composite hyperbolic tangent-cubic nonlinearity: Complex dynamics, hardware implementation and image encryption application. Integration 81:71–83

    Article  Google Scholar 

  141. Yu F, Liu L, Xiao L, Li K, Cai S (2019) A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function. Neurocomputing 350:108–116

    Article  Google Scholar 

  142. Yu F, Zhang Z, Shen H, Huang Y, Cai S, Jin J, Du S (2021a) Design and fpga implementation of a pseudo-random number generator based on a hopfield neural network under electromagnetic radiation. In: Frontiers in Physics

  143. Yu F, Li L, He B, Liu L, Qian S, Zhang Z, Shen H, Cai S, Li Y (2021) Pseudorandom number generator based on a 5d hyperchaotic four-wing memristive system and its fpga implementation. Eur Phys J-special Topics 65:1–10

    Google Scholar 

  144. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M, Dally W (2016) Eie: Efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) pp 243–254

  145. Motamedi M, Gysel P, Ghiasi S (2017) Placid: a platform for fpga-based accelerator creation for dcnns. ACM Trans Multim Comput Commun Appl 13(1–62):21

    Google Scholar 

  146. Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance fpga-based accelerator for large-scale convolutional neural networks. 2016 26th International Conference on Field Programmable Logic and Applications (FPL) pp 1–9

Download references

Acknowledgements

The authors gratefully acknowledge support by the National Natural Science Foundation of China (No. 61971208, No. 61702128), Yunnan Reserve Talents of Young and Middle-aged Academic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thousands Plan (Shen Tao, Zhu Yan ,Yunren Social Development No. 2018 73), Major Science and Technology Projects in Yunnan Province (202002AB080001-8), Development and Application of Blockchain Service Platform Supporting Regional Integrated Energy Transactions Project of China (No. SGIT0000XTJS1900433).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, K., Ma, Q., Wu, J.W. et al. FPGA-based accelerator for object detection: a comprehensive survey. J Supercomput 78, 14096–14136 (2022). https://doi.org/10.1007/s11227-022-04415-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04415-5

Keywords

Navigation