High-speed YOLOv4-tiny hardware accelerator for self-driving automotive

Valadanzoj, Zahra; Daryanavard, Hassan; Harifi, Abbas

doi:10.1007/s11227-023-05713-2

High-speed YOLOv4-tiny hardware accelerator for self-driving automotive

Published: 27 October 2023

Volume 80, pages 6699–6724, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

337 Accesses
1 Citation
Explore all metrics

Abstract

Object detection is an important area in self-driving automotive. The YOLO algorithm and its well-embedded implementation is a promising solution for object detection. In this paper, a novel hardware implementation of YOLOv4-tiny object detection has been presented on FPGA. Since the YOLO network has many calculations and parameters, an 8-bit and 5-bit fixed-point format for data and weight has been proposed to reduce resources and memory. To compensate for the accuracy, the best decimal point position in different layers is extracted using a Genetic Algorithm for network quantization. Also, a technique for performing two multiplications simultaneously with completely different operands with one DSP BLOCK has been presented, which has increased the network execution speed by 1.8 times. We implemented our design on the Xilinx Zynq ZC706 FPGA. Our accelerator can execute YOLOv4-tiny at the resolution of 416 × 416 at the speed of 55 FPS and achieve an accuracy of 79%. Compared to the state of the art, the FPS has increased by 13%, while the accuracy has decreased by only 3%, and also the proposed scheme uses fewer DSPs, which shows the resource utilization of the proposed architecture is better than previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

A Precision-Aware Neuron Engine for DNN Accelerators

Article 26 April 2024

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

Article 12 August 2023

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Article 11 March 2024

Data availability

All of the material is owned by the authors and/or no permissions are required.

References

Kim J, Hong S, Kim E (2021) Novel on-road vehicle detection system using multi-stage convolutional neural network. IEEE Access 9:94371–94385
Article Google Scholar
Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10:100057
Article Google Scholar
Zaghari N, Fathy M, Jameii SM, Sabokrou M, Shahverdy M (2021) Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques. J Supercomput 77(4):3752–3794
Article Google Scholar
Ouyang Z, Niu J, Liu Y, Guizani M (2019) Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans Mob Comput 19(2):300–313
Article Google Scholar
CortésGallardo Medina E et al (2021) Object detection, distributed cloud computing and parallelization techniques for autonomous driving systems. Appl Sci 11(7):2925
Article Google Scholar
Oksuz K, Cam BC, Kalkan S, Akbas E (2020) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415
Article Google Scholar
Wu R et al. (2023) An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53(11):13867–13881
Article Google Scholar
Ravindran R, Santora MJ, Jamali MM (2020) Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens J 21(5):5668–5677
Article ADS Google Scholar
Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
He K, Zhang X, Ren S, and Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Redmon J, Divvala S, Girshick R, and Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Liu W et al (2016) Ssd: Single Shot Multibox Detector. In: European Conference on Computer Vision, pp 21–37, Springer
Güney E, Bayilmiş C (2022) An implementation of traffic signs and road objects detection using faster R-CNN. Sakarya Univ J Comput Inform Sci 5(2):216–224
Google Scholar
Güney E, Bayilmiş C, Çakan B (2022) An implementation of real-time traffic signs and road objects detection based on mobile GPU platforms. IEEE Access 10:86191–86203
Article Google Scholar
Jiang P, Ergu D, Liu F, Cai Y, Ma B (2022) A review of yolo algorithm developments. Proc Comput Sci 199:1066–1073
Article Google Scholar
Shawahna A, Sait SM, El-Maleh A (2018) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859
Article Google Scholar
Zeng K, Ma Q, Wu JW, Chen Z, Shen T, Yan C (2022) FPGA-based accelerator for object detection: a comprehensive survey. The J Supercomput 78(12):14096–14136
Article Google Scholar
Chen Y-X, Ruan S-J (2020) A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans Circ Syst II Express Briefs 68(2):752–756
Google Scholar
Farooq U, Marrakchi Z, and Mehrez H (2012) FPGA architectures: an overview. Tree-based heterogeneous FPGA architectures, pp 7–48
Kehtarnavaz N and Mahotra S (2010) Digital Signal Processing Laboratory: LabVIEW-Based FPGA Implementation. Universal-Publishers
Bailey DG (2011) Design for embedded image processing on FPGAs. John Wiley & Sons
Yazdeen AA, Zeebaree SR, Sadeeq MM, Kak SF, Ahmed OM, Zebari RR (2021) FPGA implementations for data encryption and decryption via concurrent and parallel computation: a review. Qubahan Acad J 1(2):8–16
Article Google Scholar
Talib MA, Majzoub S, Nasir Q, Jamal D (2021) A systematic literature review on hardware implementation of artificial intelligence algorithms. J Supercomput 77(2):1897–1938
Article Google Scholar
Redmon J and Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint
Redmon J and Farhadi A (2018) Yolov3: an incremental improvement," arXiv preprint arXiv:1804.02767
Bochkovskiy A, Wang C-Y, and Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934
Song Q, Zhang J, Sun L, Jin G (2022) Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10:91497–91508
Article Google Scholar
Zhao J, Yang S, Li Q, Liu Y, Gu X, Liu W (2021) A new bearing fault diagnosis method based on signal-to-image mapping and convolutional neural network. Measurement 176:109088
Article Google Scholar
Cong J and Xiao B (2014) Minimizing Computation in Convolutional Neural Networks. In: International Conference on Artificial Neural Networks, pp 281–290, Springer
Nagi J et al (2011) Max-Pooling Convolutional Neural Networks for Vision-Based Hand Gesture Recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp 342–347, IEEE
Mittal S (2020) A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput Appl 32(4):1109–1139
Article Google Scholar
Pestana D et al (2021) A full featured configurable accelerator for object detection with YOLO. IEEE Access 9:75864–75877
Article Google Scholar
Zhang N, Wei X, Chen H, Liu W (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282
Article Google Scholar
Xu K et al (2021) A dedicated hardware accelerator for real-time acceleration of YOLOv2. J Real-Time Image Process 18(3):481–492
Article Google Scholar
Prasad P, Parane K, and Talawar B (2019) High-performance NoCs employing the DSP48E1 blocks of the Xilinx FPGAs. In: 20th international symposium on quality electronic design (ISQED), pp 163–169, IEEE
Cheah HY, Brosser F, Fahmy SA, Maskell DL (2014) The iDEA DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 7(3):1–23
Article Google Scholar
Farrukh FUD et al (2020) Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J Circ Syst 1:76–87
Article Google Scholar
Nguyen DT, Nguyen TN, Kim H, Lee H (2019) A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 27(8):1861–1873
Article Google Scholar
W1. Available: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
Huang C, Liu P, Fang L (2021) MXQN: Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks. Appl Intell 51(7):4561–4574
Article Google Scholar
Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126
Article PubMed Google Scholar
W2. Available: https://opencv.org/introduction-to-the-coco-dataset/
Montgomerie-Corcoran A, Toupas P, Yu Z, and Bouganis C-S (2023) SATAY: a streaming architecture toolflow for accelerating YOLO Models on FPGA Devices. arXiv preprint arXiv:2309.01587
Hosseiny A, Jahanirad H (2023) Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J Real-Time Image Proc 20(4):75
Article Google Scholar

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Hormozgan, Bandar Abbas, Iran
Zahra Valadanzoj, Hassan Daryanavard & Abbas Harifi

Authors

Zahra Valadanzoj
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Daryanavard
View author publications
You can also search for this author in PubMed Google Scholar
Abbas Harifi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The ideas of the paper were presented by Dr. HD. The simulations were done by Mrs. ZV, and Dr. HD wrote the main manuscript text, and the final editing was done by Dr. AH. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Competing interests

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Valadanzoj, Z., Daryanavard, H. & Harifi, A. High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J Supercomput 80, 6699–6724 (2024). https://doi.org/10.1007/s11227-023-05713-2

Download citation

Accepted: 06 October 2023
Published: 27 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11227-023-05713-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-speed YOLOv4-tiny hardware accelerator for self-driving automotive

Abstract

Access this article

Similar content being viewed by others

A Precision-Aware Neuron Engine for DNN Accelerators

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-speed YOLOv4-tiny hardware accelerator for self-driving automotive

Abstract

Access this article

Similar content being viewed by others

A Precision-Aware Neuron Engine for DNN Accelerators

A performance comparison of YOLOv8 models for traffic sign detection in the Robotaxi-full scale autonomous vehicle competition

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation