Skip to main content

Advertisement

Log in

High-speed YOLOv4-tiny hardware accelerator for self-driving automotive

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Object detection is an important area in self-driving automotive. The YOLO algorithm and its well-embedded implementation is a promising solution for object detection. In this paper, a novel hardware implementation of YOLOv4-tiny object detection has been presented on FPGA. Since the YOLO network has many calculations and parameters, an 8-bit and 5-bit fixed-point format for data and weight has been proposed to reduce resources and memory. To compensate for the accuracy, the best decimal point position in different layers is extracted using a Genetic Algorithm for network quantization. Also, a technique for performing two multiplications simultaneously with completely different operands with one DSP BLOCK has been presented, which has increased the network execution speed by 1.8 times. We implemented our design on the Xilinx Zynq ZC706 FPGA. Our accelerator can execute YOLOv4-tiny at the resolution of 416 × 416 at the speed of 55 FPS and achieve an accuracy of 79%. Compared to the state of the art, the FPS has increased by 13%, while the accuracy has decreased by only 3%, and also the proposed scheme uses fewer DSPs, which shows the resource utilization of the proposed architecture is better than previous works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

All of the material is owned by the authors and/or no permissions are required.

References

  1. Kim J, Hong S, Kim E (2021) Novel on-road vehicle detection system using multi-stage convolutional neural network. IEEE Access 9:94371–94385

    Article  Google Scholar 

  2. Gupta A, Anpalagan A, Guan L, Khwaja AS (2021) Deep learning for object detection and scene perception in self-driving cars: survey, challenges, and open issues. Array 10:100057

    Article  Google Scholar 

  3. Zaghari N, Fathy M, Jameii SM, Sabokrou M, Shahverdy M (2021) Improving the learning of self-driving vehicles based on real driving behavior using deep neural network techniques. J Supercomput 77(4):3752–3794

    Article  Google Scholar 

  4. Ouyang Z, Niu J, Liu Y, Guizani M (2019) Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans Mob Comput 19(2):300–313

    Article  Google Scholar 

  5. CortésGallardo Medina E et al (2021) Object detection, distributed cloud computing and parallelization techniques for autonomous driving systems. Appl Sci 11(7):2925

    Article  Google Scholar 

  6. Oksuz K, Cam BC, Kalkan S, Akbas E (2020) Imbalance problems in object detection: a review. IEEE Trans Pattern Anal Mach Intell 43(10):3388–3415

    Article  Google Scholar 

  7. Wu R et al. (2023) An efficient lightweight CNN acceleration architecture for edge computing based-on FPGA. Appl Intell 53(11):13867–13881

    Article  Google Scholar 

  8. Ravindran R, Santora MJ, Jamali MM (2020) Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens J 21(5):5668–5677

    Article  ADS  Google Scholar 

  9. Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  10. He K, Zhang X, Ren S, and Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  11. Redmon J, Divvala S, Girshick R, and Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  12. Liu W et al (2016) Ssd: Single Shot Multibox Detector. In: European Conference on Computer Vision, pp 21–37, Springer

  13. Güney E, Bayilmiş C (2022) An implementation of traffic signs and road objects detection using faster R-CNN. Sakarya Univ J Comput Inform Sci 5(2):216–224

    Google Scholar 

  14. Güney E, Bayilmiş C, Çakan B (2022) An implementation of real-time traffic signs and road objects detection based on mobile GPU platforms. IEEE Access 10:86191–86203

    Article  Google Scholar 

  15. Jiang P, Ergu D, Liu F, Cai Y, Ma B (2022) A review of yolo algorithm developments. Proc Comput Sci 199:1066–1073

    Article  Google Scholar 

  16. Shawahna A, Sait SM, El-Maleh A (2018) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859

    Article  Google Scholar 

  17. Zeng K, Ma Q, Wu JW, Chen Z, Shen T, Yan C (2022) FPGA-based accelerator for object detection: a comprehensive survey. The J Supercomput 78(12):14096–14136

    Article  Google Scholar 

  18. Chen Y-X, Ruan S-J (2020) A throughput-optimized channel-oriented processing element array for convolutional neural networks. IEEE Trans Circ Syst II Express Briefs 68(2):752–756

    Google Scholar 

  19. Farooq U, Marrakchi Z, and Mehrez H (2012) FPGA architectures: an overview. Tree-based heterogeneous FPGA architectures, pp 7–48

  20. Kehtarnavaz N and Mahotra S (2010) Digital Signal Processing Laboratory: LabVIEW-Based FPGA Implementation. Universal-Publishers

  21. Bailey DG (2011) Design for embedded image processing on FPGAs. John Wiley & Sons

  22. Yazdeen AA, Zeebaree SR, Sadeeq MM, Kak SF, Ahmed OM, Zebari RR (2021) FPGA implementations for data encryption and decryption via concurrent and parallel computation: a review. Qubahan Acad J 1(2):8–16

    Article  Google Scholar 

  23. Talib MA, Majzoub S, Nasir Q, Jamal D (2021) A systematic literature review on hardware implementation of artificial intelligence algorithms. J Supercomput 77(2):1897–1938

    Article  Google Scholar 

  24. Redmon J and Farhadi A (2017) Yolo9000: better, faster, stronger arXiv preprint

  25. Redmon J and Farhadi A (2018) Yolov3: an incremental improvement," arXiv preprint arXiv:1804.02767

  26. Bochkovskiy A, Wang C-Y, and Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934

  27. Song Q, Zhang J, Sun L, Jin G (2022) Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10:91497–91508

    Article  Google Scholar 

  28. Zhao J, Yang S, Li Q, Liu Y, Gu X, Liu W (2021) A new bearing fault diagnosis method based on signal-to-image mapping and convolutional neural network. Measurement 176:109088

    Article  Google Scholar 

  29. Cong J and Xiao B (2014) Minimizing Computation in Convolutional Neural Networks. In: International Conference on Artificial Neural Networks, pp 281–290, Springer

  30. Nagi J et al (2011) Max-Pooling Convolutional Neural Networks for Vision-Based Hand Gesture Recognition. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp 342–347, IEEE

  31. Mittal S (2020) A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput Appl 32(4):1109–1139

    Article  Google Scholar 

  32. Pestana D et al (2021) A full featured configurable accelerator for object detection with YOLO. IEEE Access 9:75864–75877

    Article  Google Scholar 

  33. Zhang N, Wei X, Chen H, Liu W (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282

    Article  Google Scholar 

  34. Xu K et al (2021) A dedicated hardware accelerator for real-time acceleration of YOLOv2. J Real-Time Image Process 18(3):481–492

    Article  Google Scholar 

  35. Prasad P, Parane K, and Talawar B (2019) High-performance NoCs employing the DSP48E1 blocks of the Xilinx FPGAs. In: 20th international symposium on quality electronic design (ISQED), pp 163–169, IEEE

  36. Cheah HY, Brosser F, Fahmy SA, Maskell DL (2014) The iDEA DSP block-based soft processor for FPGAs. ACM Trans Reconfig Technol Syst (TRETS) 7(3):1–23

    Article  Google Scholar 

  37. Farrukh FUD et al (2020) Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J Circ Syst 1:76–87

    Article  Google Scholar 

  38. Nguyen DT, Nguyen TN, Kim H, Lee H (2019) A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 27(8):1861–1873

    Article  Google Scholar 

  39. W1. Available: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights

  40. Huang C, Liu P, Fang L (2021) MXQN: Mixed quantization for reducing bit-width of weights and activations in deep convolutional neural networks. Appl Intell 51(7):4561–4574

    Article  Google Scholar 

  41. Katoch S, Chauhan SS, Kumar V (2021) A review on genetic algorithm: past, present, and future. Multimed Tools Appl 80(5):8091–8126

    Article  PubMed  Google Scholar 

  42. W2. Available: https://opencv.org/introduction-to-the-coco-dataset/

  43. Montgomerie-Corcoran A, Toupas P, Yu Z, and Bouganis C-S (2023) SATAY: a streaming architecture toolflow for accelerating YOLO Models on FPGA Devices. arXiv preprint arXiv:2309.01587

  44. Hosseiny A, Jahanirad H (2023) Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J Real-Time Image Proc 20(4):75

    Article  Google Scholar 

Download references

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

The ideas of the paper were presented by Dr. HD. The simulations were done by Mrs. ZV, and Dr. HD wrote the main manuscript text, and the final editing was done by Dr. AH. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Competing interests

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valadanzoj, Z., Daryanavard, H. & Harifi, A. High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J Supercomput 80, 6699–6724 (2024). https://doi.org/10.1007/s11227-023-05713-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05713-2

Keywords

Navigation