FPGA-based accelerator for object detection: a comprehensive survey

Zeng, Kai; Ma, Qian; Wu, Jia Wen; Chen, Zhe; Shen, Tao; Yan, Chenggang

doi:10.1007/s11227-022-04415-5

FPGA-based accelerator for object detection: a comprehensive survey

Published: 29 March 2022

Volume 78, pages 14096–14136, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kai Zeng^1,2,
Qian Ma^1,2,
Jia Wen Wu^1,2,
Zhe Chen^1,2,
Tao Shen^1,2 &
…
Chenggang Yan³

4115 Accesses
30 Citations
Explore all metrics

Abstract

Object detection is one of the most challenging tasks in computer vision. With the advances in semiconductor devices and chip technology, hardware accelerators have been widely used. Field-programmable gate arrays (FPGAs) are a highly flexible hardware platform that allows customized reconfiguration of the integrated circuit, which has the potential to improve the efficiency of object detection accelerators. However, few reviews summarize FPGA-based object detection accelerators. Also, there is no general principle for realizing object detection according to FPGA characteristics. In this paper, the current hardware accelerators are introduced and compared. Then, the typical deep learning-based object detectors are summarized. Next, the questions of “Why choose FPGA,” “The design goals of FPGA accelerators” and “The design methods for FPGA accelerators” are discussed in detail. Finally, the challenges of object detection algorithms, hardware, and co-design are presented. In addition, an online platform (https://github.com/vivian13maker/) is constructed to provide specific information on all advanced works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

A dedicated hardware accelerator for real-time acceleration of YOLOv2

Article 09 May 2020

Enabling real-time object detection on low cost FPGAs

Article 30 October 2021

References

Cheng Z, Zhu X, Gong S (2020) Face re-identification challenge: Are face recognition models good enough? Pattern Recognit 107:107422
Article Google Scholar
Xu Y, Zhang Z, Lu G, Yang J (2016) Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification. Pattern Recognit 54:68–82
Article Google Scholar
Peng C, Wang N, Li J, Gao X (2019) Dlface: deep local descriptor for cross-modality face recognition. Pattern Recognit 90:161–171
Article Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 815–823
Saeidi M, Ahmadi A (2020) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput 77:2033–2068
Article Google Scholar
Han SS, Kim YK, Jeon YB, Park J, Park DS, Hwang DH, Jeong CS (2020) Distributed deep learning platform for pedestrian detection on it convergence environment. J Supercomput 76:5460–5485
Article Google Scholar
Hua W, Mu D, Zheng Z, Guo D (2017) Online multi-person tracking assist by high-performance detection. J Supercomput 76:4076–4094
Article Google Scholar
Zaghari N, Fathy M, Jameii SM, Shahverdy M (2021) The improvement in obstacle detection in autonomous vehicles using yolo non-maximum suppression fuzzy algorithm. J Supercomput 55:1–26
Google Scholar
Zaghari N, Fathy M, Jameii SM, Sabokrou M, Shahverdy M (2020) Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques
Atahary T, Taha T, Douglass S (2020) Parallelized path-based search for constraint satisfaction in autonomous cognitive agents. J Supercomput 77:1667–1692
Article Google Scholar
Cho S, Cho K (2019) Real-time 3d reconstruction method using massive multi-sensor data analysis and fusion. J Supercomput 75:3229–3248
Article Google Scholar
Zhang W, Cho S, Chae J, Sung Y, Cho K (2018) Object tracking method based on data computing. J Supercomput 75:3217–3228
Article Google Scholar
Constantinescu DA, Navarro A, Corbera F, Fernández-Madrigal J, Asenjo R (2020) Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J Supercomput 77:44–65
Article Google Scholar
Hao X, Zhang G, Ma S (2016) Deep learning. Int J Semantic Comput 10:417
Article Google Scholar
Goodfellow I, Bengio Y, Courville AC (2015) Deep learning. Nature 521:436–444
Article MATH Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE Conference on Computer Vision and Pattern Recognition pp 3354–3361
Kyrkou C, Theocharides T (2012) A parallel hardware architecture for real-time object detection with support vector machines. IEEE Trans Computers 61:831–842
Article MathSciNet MATH Google Scholar
Hsiao P, Lin SY, Huang SS (2015) An fpga based human detection system with embedded platform. Microelectron Eng 138:42–46
Article Google Scholar
Feng X, Jiang Y, Yang X, Du M, Li X (2019) Computer vision algorithms and hardware implementations: a survey. Integration 69:309–320
Article Google Scholar
B C, S O (2019) Hardware designs for histogram of oriented gradients in pedestrian detection: A survey. 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) pp 849–854
Borrego-Carazo J, Castells-Rufas D, Biempica E, Carrabina J (2020) Resource-constrained machine learning for adas: a systematic review. IEEE Access 8:40573–40598
Article Google Scholar
Li T, Ma Y, Endoh T (2020) A systematic study of tiny yolo3 inference: toward compact brainware processor with less memory and logic gate. IEEE Access 8:142931–142955
Article Google Scholar
Talib M, Majzoub S, Nasir Q, Jamal D (2020) A systematic literature review on hardware implementation of artificial intelligence algorithms. J Supercomput 77:1897–1938
Article Google Scholar
Xiyuan P, Jinxiang Y, Bowen Y, Liansheng L, Peng Y (2021) A review of fpga-based custom computing architecture for convolutional neural network inference. Chinese J Electron 30:1–17
Article Google Scholar
Li Y, Wang S, Tian Q, Ding X (2015) Feature representation for statistical-learning-based object detection: a review. Pattern Recognit 48:3542–3559
Article Google Scholar
Zhiqiang W, Jun L (2017) A review of object detection based on convolutional neural network. 2017 36th Chinese Control Conference (CCC) pp 11104–11109
Sharma K, Thakur NV (2017) A review and an approach for object detection in images. Int J Comput Vision Robot 7:196–237
Article Google Scholar
Tao Y, Ma R, Shyu M, Chen SC (2020) Challenges in energy-efficient deep neural network training with fpga. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 1602–1611
Rodríguez-Andina J, Pena MDV, Moure MJ (2015) Advanced features and industrial applications of fpgas-a review. IEEE Trans Indus Inform 11:853–864
Article Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9
Shawahna A, Sait SM, El-Maleh A (2019) Fpga-based accelerators of deep learning networks for learning and classification: A review. IEEE Access 7:7823–7859
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556
Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Hock JOG, Liew YT, Srivatsan K, Moss DJM, Subhaschandra S, Boudoukh G (2017) Can fpgas beat gpus in accelerating next-generation deep neural networks? Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Garland M, Grand SML, Nickolls J, Anderson J, Hardwick J, Morton S, Phillips EH, Zhang Y, Volkov V (2008) Parallel computing experiences with cuda. IEEE Micro 28:81
Article Google Scholar
Stone J, Gohara D, Shi G (2010) Opencl: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12:66–73
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick RB, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: NeurIPS
Ma Y, Yu D, Wu T, Wang H (2019) Paddlepaddle: An open-source deep learning platform from industrial practice
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zhang X (2016) Tensorflow: A system for large-scale machine learning. In: OSDI
Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2013) Neural acceleration for general-purpose approximate programs. IEEE Micro 33:16–27
Article Google Scholar
Wang Y, Wei GY, Brooks D (2019) Benchmarking tpu, gpu, and cpu platforms for deep learning. http://arxiv.org/abs/1907.10701
Akopyan F, Sawada J, Cassidy A, Alvarez-Icaza R, Arthur J, Merolla P, Imam N, Nakamura Y, Datta P, Nam GJ, Taba B, Beakes M, Brezzo B, Kuang JB, Manohar R, Risk W, Jackson B, Modha D (2015) Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans Computer-Aided Des Integ Circuits Syst 34:1537–1557
Article Google Scholar
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems
Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al. (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp 1–12
Ebeling C, Cronquist DC, Franklin P (1997) Configurable computing: the catalyst for high-performance architectures. Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors pp 364–372
Herbordt M, Gu Y, Court T, Model J, Sukhwani B, Chiu M (2008) Computing models for fpga-based accelerators. Comput Sci Eng 10:51
Article Google Scholar
Lacey G, Taylor GW, Areibi S (2016) Deep learning on fpgas: Past, present, and future. http://arxiv.org/abs/1602.04283
Cong J, Fang Z, Lo M, Wang H, Xu J, Zhang S (2018) Understanding performance differences of fpgas and gpus. 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) pp 93–96
Bajestani MF, Ghasemi M, Vrudhula S, Yang Y (2020) Enabling incremental knowledge transfer for object detection at the edge. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 1591–1599
Dang V, Skadron K (2017) Acceleration of frequent itemset mining on fpga using sdaccel and vivado hls. 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) pp 195–200
Kathail V (2020) Xilinx vitis unified software platform. Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Felzenszwalb PF, Girshick RB, McAllester DA, Ramanan D (2009) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32:1627–1645
Article Google Scholar
Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE Conference on Computer Vision and Pattern Recognition pp 580–587
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Zhang J, Jin X, Sun J, Wang J, Sangaiah AK (2018) Spatial and semantic convolutional features for robust visual object tracking. Multim Tools Appl 79:15095–15115
Article Google Scholar
Hui Q (2019) Motion video tracking technology in sports training based on mean-shift algorithm. J Supercomput 75:6021–6037
Article Google Scholar
Ding P, Zhang J, Zhou H, Zou X, Wang M (2020) Pyramid context learning for object detection. J Supercomput 64:1–14
Google Scholar
Taranto-Vera G, Galindo-Villardón P, Merchán-Sánchez-Jara J, Salazar-Pozo J, Moreno-Salazar A, Salazar-Villalva V (2021) Algorithms and software for data mining and machine learning: a critical comparative view from a systematic review of the literature. J Supercomput 23:1–33
Google Scholar
Zhang D, Liang Z, Yang G, Li Q, Li L, Sun X (2017) A robust forgery detection algorithm for object removal by exemplar-based image inpainting. Multim Tools Appl 77:11823–11842
Article Google Scholar
Liang Z, Yang G, Ding X, Li L (2015) An efficient forgery detection algorithm for object removal by exemplar-based image inpainting. J Vis Commun Image Represent 30:75–85
Article Google Scholar
Shehab M, Al-Ayyoub M, Jararweh Y, Jarrah M (2016) Accelerating compute-intensive image segmentation algorithms using gpus. J Supercomput 73:1929–1951
Article Google Scholar
Li W, Ding S, Chen Y, Wang H, Yang S (2018) Transfer learning-based default prediction model for consumer credit in china. J Supercomput 75:862–884
Article Google Scholar
Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) 1:886–893 vol. 1
Girshick RB (2015) Fast r-cnn. 2015 IEEE International Conference on Computer Vision (ICCV) pp 1440–1448
Ren S, He K, Girshick RB, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. http://arxiv.org/abs/1605.06409
He K, Gkioxari G, Dollár P, Girshick RB (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42:386–397
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1916
Article Google Scholar
Lin TY, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 936–944
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 8759–8768
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. http://arxiv.org/abs/1905.11946
Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q (2020) Feature pyramid transformer. http://arxiv.org/abs/2007.09451
Qiao S, Chen LC, Yuille A (2021) Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In: CVPR
Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. In: ECCV
Cai L, Zhao B, Wang Z, Lin J, Foo CS, Aly M, Chandrasekhar V (2019) Maxpoolnms: Getting rid of nms bottlenecks in two-stage object detectors. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 9348–9356
Zhang T, Lin J, Hu P, Zhao B, Aly M (2021) Psrr-maxpoolnms: Pyramid shifted maxpoolnms with relationship recovery. http://arxiv.org/abs/2105.12990
Redmon J, Divvala S, Girshick RB, Farhadi A (2016) You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 6517–6525
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. http://arxiv.org/abs/1804.02767
Bochkovskiy A, Wang CY, Liao H (2020) Yolov4: Optimal speed and accuracy of object detection. http://arxiv.org/abs/2004.10934
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu CY, Berg A (2016) Ssd: Single shot multibox detector. In: ECCV
Fu CY, Liu W, Ranga A, Tyagi A, Berg A (2017) Dssd : Deconvolutional single shot detector. http://arxiv.org/abs/1701.06659
Zhou S, Qiu J (2021) Enhanced ssd with interactive multi-scale attention features for object detection. Multimedia Tools and Applications pp 1–18
Lin TY, Goyal P, Girshick RB, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327
Article Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. http://arxiv.org/abs/2005.12872
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. http://arxiv.org/abs/2103.14030
Zheng M, Gao P, Wang X, Li H, Dong H (2020) End-to-end object detection with adaptive clustering transformer. http://arxiv.org/abs/2011.09315
Zhu X, Su W, Lu L, Li B, Wang X, Dai J (2021) Deformable detr: Deformable transformers for end-to-end object detection. http://arxiv.org/abs/2010.04159
Dai Z, Cai B, Lin Y, Chen J (2020) Up-detr: Unsupervised pre-training for object detection with transformers.http://arxiv.org/abs/2011.09094
Everingham M, Gool L, Williams CKI, Winn J, Zisserman A (2009) The pascal visual object classes (voc) challenge. Int J Computer Vision 88:303–338
Article Google Scholar
Lin TY, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: ECCV
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: A survey. http://arxiv.org/abs/1905.05055
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimedia Tools and Applications pp 1–63
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. http://arxiv.org/abs/2104.11892
Fan H, Liu S, Ferianc M, Ng HC, Que Z, Liu S, Niu X, Luk W (2018) A real-time object detection accelerator with compressed ssdlite on fpga. 2018 International Conference on Field-Programmable Technology (FPT) pp 14–21
Zhang S, Wen L, Bian X, Lei Z, Li S (2018) Single-shot refinement neural network for object detection. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 4203–4212
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. In: AAAI
Nguyen DT, Nguyen TN, Kim H, Lee HJ (2019) A high-throughput and power-efficient fpga implementation of yolo cnn for object detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27:1861–1873
Xu X, Liu B (2018) Fclnn: A flexible framework for fast cnn prototyping on fpga with opencl and caffe. 2018 International Conference on Field-Programmable Technology (FPT) pp 238–241
Wang Z, Xu K, Wu S, Liu L, Liu L, Wang D (2020) Sparse-yolo: hardware/software co-design of an fpga accelerator for yolov2. IEEE Access 8:116569–116585
Article Google Scholar
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp 6568–6577
Kim S, Na S, Kong BY, Choi J, Park IC (2021) Real-time ssdlite object detection on fpga. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP(99):1–14
Mani V, Saravanaselvan A, Arumugam N (2022) Performance comparison of cnn, qnn and bnn deep neural networks for real-time object detection using zynq fpga node. Microelectronics Journal 119:105319 https://doi.org/10.1016/j.mejo.2021.105319, https://www.sciencedirect.com/science/article/pii/S0026269221003001
Bao C, Xie T, Feng W, Chang L, Yu C (2020) A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access 8:94307–94317
Article Google Scholar
Lu L, Liang Y, Xiao Q, Yan S (2017) Evaluating fast algorithms for convolutional neural networks on fpgas. 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) pp 101–108
Ma Y, Cao Y, Vrudhula S, sun Seo J (2017) Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Courbariaux M, Hubara I, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. http://arxiv.org/abs/Learning
Venieris SI, Bouganis C (2017) Latency-driven design for fpga-based convolutional neural networks. 2017 27th International Conference on Field Programmable Logic and Applications (FPL) pp 1–8
Zhou S, Ni Z, Zhou X, Wen H, Wu Y, Zou Y (2016) Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. http://arxiv.org/abs/1606.06160
Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer cnn accelerators. 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) pp 1–12
Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: A binarized cnn with a parallel support vector regression for an fpga. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Nguyen DT, Kim H, Lee HJ, Chang I (2018) An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. 2018 IEEE International Symposium on Circuits and Systems (ISCAS) pp 1–5
Rastegari M, Ordonez V, Redmon J, Farhadi A (2016) Xnor-net: Imagenet classification using binary convolutional neural networks. In: ECCV
Ding C, Wang S, Liu N, Xu K, Wang Y, Liang Y (2019) Req-yolo: A resource-aware, efficient quantization framework for object detection on fpgas. Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Courbariaux M, Bengio Y, David J (2015) Binaryconnect: Training deep neural networks with binary weights during propagations. In: NIPS
Aydonat U, O’Connell S, Capalija D, Ling A, Chiu G (2017) An opencl™ deep learning accelerator on arria 10. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Wang D, Xu K, Jia Q, Ghiasi S (2019) Abm-spconv: A novel approach to fpga-based acceleration of convolutionai neurai network inference. 2019 56th ACM/IEEE Design Automation Conference (DAC) pp 1–6
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: ECCV
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 10778–10787
Ramachandran P, Zoph B, Le QV (2018) Searching for activation functions. http://arxiv.org/abs/1710.05941
Iandola FN, Moskewicz M, Ashraf K, Han S, Dally W, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and<1mb model size. http://arxiv.org/abs/1602.07360
Lin M, Chen Q, Yan S (2014) Network in network. http://arxiv.org/abs/1312.4400
He Y, Peemen M, Waeijen L, Diken E, Fiumara M, Rauwerda G, Corporaal H, Geng T (2016) A configurable simd architecture with explicit datapath for intelligent learning. 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) pp 156–163
Loan C (1992) Computational frameworks for the fast fourier transform
Sakaguchi K, Bras RL, Bhagavatula C, Choi Y (2020) Winogrande: An adversarial winograd schema challenge at scale. http://arxiv.org/abs/1907.10641
Winograd S (1980) Arithmetic complexity of computations
Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on cpu-fpga shared memory system. Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: Automl for model compression and acceleration on mobile devices. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–800
Wu J, Zhang Y, Bai H, Zhong H, Hou J, Liu W, Huang W, Huang J (2018) Pocketflow: An automated framework for compressing and accelerating deep neural networks
Yu Z, Bouganis C (2020) A parameterisable fpga-tailored architecture for yolov3-tiny. In: ARC
Li S, Luo Y, Sun K, Yadav N, Choi K (2020) A novel fpga accelerator design for real-time and ultra-low power deep convolutional neural networks compared with titan x gpu. IEEE Access 8:105455–105471
Article Google Scholar
Zhang S, Cao J, Zhang Q, Zhang Q, Zhang Y, Wang Y (2020) An fpga-based reconfigurable cnn accelerator for yolo. 2020 IEEE 3rd International Conference on Electronics Technology (ICET) pp 74–78
Ma Y, Cao Y, Vrudhula S, sun Seo J (2017) An automatic rtl compiler for high-throughput fpga implementation of diverse deep convolutional neural networks. 2017 27th International Conference on Field Programmable Logic and Applications (FPL) pp 1–8
Zhu C, Huang K, Yang S, Zhu ZQ, Zhang H, Shen H (2020) An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28:1953–1965
Alwani M, Chen H, Ferdman M, Milder P (2016) Fused-layer cnn accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 1–12, https://doi.org/10.1109/MICRO.2016.7783725
Saidi A, Ben Othman S, Dhouibi M, Ben Saoud S (2021) Fpga-based implementation of classification techniques: A survey. Integration 81:280–299 https://doi.org/10.1016/j.vlsi.2021.08.004, https://www.sciencedirect.com/science/article/pii/S0167926021000894
Zhao R, Ng HC, Luk W, Niu X (2018) Towards efficient convolutional neural network for domain-specific applications on fpga. In: 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp 147–1477, https://doi.org/10.1109/FPL.2018.00033
Seto K, Nejatollahi H, An J, Kang S, Dutt N (2019) Small memory footprint neural network accelerators. In: 20th International Symposium on Quality Electronic Design (ISQED), pp 253–258, https://doi.org/10.1109/ISQED.2019.8697641
Yu F, Shen H, Zhang Z, Huang Y, Cai S, Du S (2021) A new multi-scroll chua’s circuit with composite hyperbolic tangent-cubic nonlinearity: Complex dynamics, hardware implementation and image encryption application. Integration 81:71–83
Article Google Scholar
Yu F, Liu L, Xiao L, Li K, Cai S (2019) A robust and fixed-time zeroing neural dynamics for computing time-variant nonlinear equation using a novel nonlinear activation function. Neurocomputing 350:108–116
Article Google Scholar
Yu F, Zhang Z, Shen H, Huang Y, Cai S, Jin J, Du S (2021a) Design and fpga implementation of a pseudo-random number generator based on a hopfield neural network under electromagnetic radiation. In: Frontiers in Physics
Yu F, Li L, He B, Liu L, Qian S, Zhang Z, Shen H, Cai S, Li Y (2021) Pseudorandom number generator based on a 5d hyperchaotic four-wing memristive system and its fpga implementation. Eur Phys J-special Topics 65:1–10
Google Scholar
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz M, Dally W (2016) Eie: Efficient inference engine on compressed deep neural network. 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) pp 243–254
Motamedi M, Gysel P, Ghiasi S (2017) Placid: a platform for fpga-based accelerator creation for dcnns. ACM Trans Multim Comput Commun Appl 13(1–62):21
Google Scholar
Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance fpga-based accelerator for large-scale convolutional neural networks. 2016 26th International Conference on Field Programmable Logic and Applications (FPL) pp 1–9

Download references

Acknowledgements

The authors gratefully acknowledge support by the National Natural Science Foundation of China (No. 61971208, No. 61702128), Yunnan Reserve Talents of Young and Middle-aged Academic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thousands Plan (Shen Tao, Zhu Yan ,Yunren Social Development No. 2018 73), Major Science and Technology Projects in Yunnan Province (202002AB080001-8), Development and Application of Blockchain Service Platform Supporting Regional Integrated Energy Transactions Project of China (No. SGIT0000XTJS1900433).

Author information

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, 650093, Kunming, China
Kai Zeng, Qian Ma, Jia Wen Wu, Zhe Chen & Tao Shen
Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology, 650500, Kunming, China
Kai Zeng, Qian Ma, Jia Wen Wu, Zhe Chen & Tao Shen
Automation School of Hangzhou Dianzi University, 310000, Hangzhou, China
Chenggang Yan

Authors

Kai Zeng
View author publications
You can also search for this author inPubMed Google Scholar
Qian Ma
View author publications
You can also search for this author inPubMed Google Scholar
Jia Wen Wu
View author publications
You can also search for this author inPubMed Google Scholar
Zhe Chen
View author publications
You can also search for this author inPubMed Google Scholar
Tao Shen
View author publications
You can also search for this author inPubMed Google Scholar
Chenggang Yan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Tao Shen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, K., Ma, Q., Wu, J.W. et al. FPGA-based accelerator for object detection: a comprehensive survey. J Supercomput 78, 14096–14136 (2022). https://doi.org/10.1007/s11227-022-04415-5

Download citation

Accepted: 26 February 2022
Published: 29 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11227-022-04415-5

Keywords

Part of a collection:

Computer Science SDG 7: Affordable and Clean Energy

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FPGA-based accelerator for object detection: a comprehensive survey

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

A dedicated hardware accelerator for real-time acceleration of YOLOv2

Enabling real-time object detection on low cost FPGAs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now