Abstract
Object detection is an important area in self-driving automotive. The YOLO algorithm and its well-embedded implementation is a promising solution for object detection. In this paper, a novel hardware implementation of YOLOv4-tiny object detection has been presented on FPGA. Since the YOLO network has many calculations and parameters, an 8-bit and 5-bit fixed-point format for data and weight has been proposed to reduce resources and memory. To compensate for the accuracy, the best decimal point position in different layers is extracted using a Genetic Algorithm for network quantization. Also, a technique for performing two multiplications simultaneously with completely different operands with one DSP BLOCK has been presented, which has increased the network execution speed by 1.8 times. We implemented our design on the Xilinx Zynq ZC706 FPGA. Our accelerator can execute YOLOv4-tiny at the resolution of 416 × 416 at the speed of 55 FPS and achieve an accuracy of 79%. Compared to the state of the art, the FPS has increased by 13%, while the accuracy has decreased by only 3%, and also the proposed scheme uses fewer DSPs, which shows the resource utilization of the proposed architecture is better than previous works.
Similar content being viewed by others
Data availability
All of the material is owned by the authors and/or no permissions are required.
References
Kim J, Hong S, Kim E (2021) Novel on-road vehicle detection system using multi-stage convolutional neural network. IEEE Access 9:94371–94385
Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10:100057
Zaghari N, Fathy M, Jameii SM, Sabokrou M, Shahverdy M (2021) Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques. J Supercomput 77(4):3752–3794
Ouyang Z, Niu J, Liu Y, Guizani M (2019) Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans Mob Comput 19(2):300–313
CortésGallardo Medina E et al (2021) Object detection, distributed cloud computing and parallelization techniques for autonomous driving systems. Appl Sci 11(7):2925
Oksuz K, Cam BC, Kalkan S, Akbas E (2020) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415
Wu R et al. (2023) An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53(11):13867–13881
Ravindran R, Santora MJ, Jamali MM (2020) Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens J 21(5):5668–5677
Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
He K, Zhang X, Ren S, and Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Redmon J, Divvala S, Girshick R, and Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Liu W et al (2016) Ssd: Single Shot Multibox Detector. In: European Conference on Computer Vision, pp 21–37, Springer
Güney E, Bayilmiş C (2022) An implementation of traffic signs and road objects detection using faster R-CNN. Sakarya Univ J Comput Inform Sci 5(2):216–224
Güney E, Bayilmiş C, Çakan B (2022) An implementation of real-time traffic signs and road objects detection based on mobile GPU platforms. IEEE Access 10:86191–86203
Jiang P, Ergu D, Liu F, Cai Y, Ma B (2022) A review of yolo algorithm developments. Proc Comput Sci 199:1066–1073
Shawahna A, Sait SM, El-Maleh A (2018) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859
Zeng K, Ma Q, Wu JW, Chen Z, Shen T, Yan C (2022) FPGA-based accelerator for object detection: a comprehensive survey. The J Supercomput 78(12):14096–14136
Chen Y-X, Ruan S-J (2020) A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans Circ Syst II Express Briefs 68(2):752–756
Farooq U, Marrakchi Z, and Mehrez H (2012) FPGA architectures: an overview. Tree-based heterogeneous FPGA architectures, pp 7–48
Kehtarnavaz N and Mahotra S (2010) Digital Signal Processing Laboratory: LabVIEW-Based FPGA Implementation. Universal-Publishers
Bailey DG (2011) Design for embedded image processing on FPGAs. John Wiley & Sons
Yazdeen AA, Zeebaree SR, Sadeeq MM, Kak SF, Ahmed OM, Zebari RR (2021) FPGA implementations for data encryption and decryption via concurrent and parallel computation: a review. Qubahan Acad J 1(2):8–16
Talib MA, Majzoub S, Nasir Q, Jamal D (2021) A systematic literature review on hardware implementation of artificial intelligence algorithms. J Supercomput 77(2):1897–1938
Redmon J and Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint
Redmon J and Farhadi A (2018) Yolov3: an incremental improvement," arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, and Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934
Song Q, Zhang J, Sun L, Jin G (2022) Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10:91497–91508
Zhao J, Yang S, Li Q, Liu Y, Gu X, Liu W (2021) A new bearing fault diagnosis method based on signal-to-image mapping and convolutional neural network. Measurement 176:109088
Cong J and Xiao B (2014) Minimizing Computation in Convolutional Neural Networks. In: International Conference on Artificial Neural Networks, pp 281–290, Springer
Nagi J et al (2011) Max-Pooling Convolutional Neural Networks for Vision-Based Hand Gesture Recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp 342–347, IEEE
Mittal S (2020) A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput Appl 32(4):1109–1139
Pestana D et al (2021) A full featured configurable accelerator for object detection with YOLO. IEEE Access 9:75864–75877
Zhang N, Wei X, Chen H, Liu W (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282
Xu K et al (2021) A dedicated hardware accelerator for real-time acceleration of YOLOv2. J Real-Time Image Process 18(3):481–492
Prasad P, Parane K, and Talawar B (2019) High-performance NoCs employing the DSP48E1 blocks of the Xilinx FPGAs. In: 20th international symposium on quality electronic design (ISQED), pp 163–169, IEEE
Cheah HY, Brosser F, Fahmy SA, Maskell DL (2014) The iDEA DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 7(3):1–23
Farrukh FUD et al (2020) Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J Circ Syst 1:76–87
Nguyen DT, Nguyen TN, Kim H, Lee H (2019) A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 27(8):1861–1873
W1. Available: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
Huang C, Liu P, Fang L (2021) MXQN: Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks. Appl Intell 51(7):4561–4574
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126
W2. Available: https://opencv.org/introduction-to-the-coco-dataset/
Montgomerie-Corcoran A, Toupas P, Yu Z, and Bouganis C-S (2023) SATAY: a streaming architecture toolflow for accelerating YOLO Models on FPGA Devices. arXiv preprint arXiv:2309.01587
Hosseiny A, Jahanirad H (2023) Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J Real-Time Image Proc 20(4):75
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
The ideas of the paper were presented by Dr. HD. The simulations were done by Mrs. ZV, and Dr. HD wrote the main manuscript text, and the final editing was done by Dr. AH. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Valadanzoj, Z., Daryanavard, H. & Harifi, A. High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J Supercomput 80, 6699–6724 (2024). https://doi.org/10.1007/s11227-023-05713-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05713-2