Skip to main content
Log in

A novel finetuned YOLOv6 transfer learning model for real-time object detection

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

A Correction to this article was published on 09 May 2023

This article has been updated

Abstract

Object detection and object recognition are the most important applications of computer vision. To pursue the task of object detection efficiently, a model with higher detection accuracy is required. Increasing the detection accuracy of the model increases the model’s size and computation cost. Therefore, it becomes a challenge to use deep learning in embedded environments. To overcome this problem, the current research suggests a transfer-learning-based model for real-time object detection that enhances the YOLO algorithm's effectiveness. The model utilizes YOLOv6 as a baseline model. This study proposes a pruning and finetuning algorithm as well as a transfer learning algorithm for enhancing the proposed model’s efficiency in terms of detection accuracy and inference speed. This paper also focuses on how the proposed model will be able to identify all objects (indoor as well as outdoor) in a scene and provides a voice output to warn the user about nearby and faraway objects. To receive the audio feedback, Google Text-to-Speech (gTTs) library is used. The model is trained on the MS-COCO dataset. The proposed model is compared with the Tensorflow Single Shot Detector model, Faster RCNN model, Mask RCNN model, YOLOv4, and baseline YOLOv6 model. After pruning the YOLOv6 baseline model by 30%, 40%, and 50%, the finetuned YOLOv6 framework hits 37.8% higher average precision (AP) with 1235 frames per second (FPS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data will be made available on appropriate request.

Change history

References

  1. Zhang, J., Wang, P., Zhao, Z., Su, F.: Pruned-YOLO: learning efficient object detector using model pruning. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 12894 LNCS, 34–45 (2021). https://doi.org/10.1007/978-3-030-86380-7_4/COVER/

  2. Li, Y., Ge, Z., Yu, G., Yang, J., Wang, Z., Shi, Y., Sun, J., Li, Z.: BEVDepth: acquisition of reliable depth for multi-view 3D object detection. arXiv preprint. https://doi.org/10.48550/arXiv.2206.10092 (2022)

  3. Xu, Q., Zhong, Y., Neumann, U.: Behind the curtain: learning occluded shapes for 3D object detection. Proc. AAAI Conf. Artif. Intell. 36, 2893–2901 (2022). https://doi.org/10.1609/aaai.v36i3.20194

    Article  Google Scholar 

  4. Sun, W., Dai, L., Zhang, X., Chang, P., He, X.: RSOD: real-time small object detection algorithm in UAV-based traffic monitoring. Appl. Intell. 52, 8448–8463 (2022). https://doi.org/10.1007/s10489-021-02893-3

    Article  Google Scholar 

  5. KhoshboreshMasouleh, M., Shah-Hosseini, R.: Development and evaluation of a deep learning model for real-time ground vehicle semantic segmentation from UAV-based thermal infrared imagery. ISPRS J. Photogramm. Remote Sens. 155, 172–186 (2019). https://doi.org/10.1016/j.isprsjprs.2019.07.009

    Article  Google Scholar 

  6. Hou, L., Chen, C., Wang, S., Wu, Y., Chen, X.: Multi-object detection method in construction machinery swarm operations based on the improved YOLOv4 model. Sensors. 22, 1–14 (2022)

    Article  Google Scholar 

  7. Mauri, A., Khemmar, R., Decoux, B., Haddad, M., Boutteau, R.: Lightweight convolutional neural network for real-time 3D object detection in road and railway environments. J. Real-Time Image Process. 19, 499–516 (2022). https://doi.org/10.1007/s11554-022-01202-6

    Article  Google Scholar 

  8. Martinez-Alpiste, I., Golcarenarenji, G., Wang, Q., Alcaraz-Calero, J.M.: Smartphone-based real-time object recognition architecture for portable and constrained systems. J. Real-Time Image Process. 19, 103–115 (2022). https://doi.org/10.1007/s11554-021-01164-1

    Article  Google Scholar 

  9. Hu, J., Wang, T., Zhu, S.: Multi-view aggregation for real-time accurate object detection of a moving camera. J. Real-Time Image Process. (2022). https://doi.org/10.1007/s11554-022-01253-9

    Article  Google Scholar 

  10. Zhang, J., Ye, Z., Jin, X., Wang, J., Zhang, J.: Real-time traffic sign detection based on multiscale attention and spatial information aggregator. J. Real-Time Image Process. (2022). https://doi.org/10.1007/s11554-022-01252-w

    Article  Google Scholar 

  11. Saponara, S., Elhanashi, A., Zheng, Q.: Developing a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19. J. Real-Time Image Process. 19, 551–563 (2022). https://doi.org/10.1007/s11554-022-01203-5

    Article  Google Scholar 

  12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 (2016)

  13. Nikkath Bushra, S., Shobana, G., Uma Maheswari, K., Subramanian, N.: Smart video survillance based weapon identification using yolov5. 351–357 (2022). https://doi.org/10.1109/ICESIC53714.2022.9783499

  14. Xia, R., Li, G., Huang, Z., Pang, Y., Qi, M.: Transformers only look once with nonlinear combination for real-time object detection. Neural Comput. Appl. (2022). https://doi.org/10.1007/s00521-022-07333-y

    Article  Google Scholar 

  15. Junayed, M.S., Islam, M.B., Imani, H., Aydin, T.: PDS-Net: a novel point and depth-wise separable convolution for real-time object detection. Int. J. Multimed. Inf. Retr. 11, 171–188 (2022). https://doi.org/10.1007/s13735-022-00229-6

    Article  Google Scholar 

  16. Kadhim, M., Oleiwi, B.: Blind assistive system based on real time object recognition using machine learning. Eng. Technol. J. 40, 159–165 (2022). https://doi.org/10.30684/etj.v40i1.1933

    Article  Google Scholar 

  17. Ashiq, F., Asif, M., Ahmad, M.B., Zafar, S., Masood, K., Mahmood, T., Mahmood, M.T., Lee, I.H.: CNN-based object recognition and tracking system to assist visually impaired people. IEEE Access. 10, 14819–14834 (2022). https://doi.org/10.1109/ACCESS.2022.3148036

    Article  Google Scholar 

  18. Gupta, C., Gill, N.S., Gulia, P.: SSDT : distance tracking model based on deep learning. Int. J. Electr. Comput. Eng. Syst. 13, 339–348 (2022). https://doi.org/10.32985/ijeces.13.5.2

    Article  Google Scholar 

  19. Gupta, C., Gill, N.S.: Coronamask: a face mask detector for real-time data. Int. J. Adv. Trends Comput. Sci. Eng. 9, 5624–5630 (2020). https://doi.org/10.30534/ijatcse/2020/212942020

    Article  Google Scholar 

  20. Cai, Y., Yuan, G., Li, H., Niu, W., Li, Y., Tang, X., Ren, B., Wang, Y.: A compression-compilation co-design framework towards real-time object detection on mobile devices. 35th AAAI Conf. Artif. Intell. AAAI 2021. 18: 1597–1600 (2021)

  21. Chen, C., Wang, G., Peng, C., Fang, Y., Zhang, D., Qin, H.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021). https://doi.org/10.1109/TIP.2021.3068644

    Article  Google Scholar 

  22. What’s New in YOLOv6?, https://blog.roboflow.com/yolov6/

  23. Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., Li, Y., Zhang, B, 30m., Liang, Y., Zhou, L., Xu, X., Chu, X., Wei, X., Wei, X.: YOLOv6: A single-stage object detection framework for industrial applications. (2022)

  24. Zhang, H., Wang, Y., Dayoub, F., Sünderhauf, N.: VarifocalNet: An IoU-aware dense object detector. Proc. IEEE Comput. Soc. Conf Comput. Vis. Pattern Recognit. (2021). https://doi.org/10.1109/CVPR46437.2021.00841

    Article  Google Scholar 

  25. Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 1–11 (2020)

    Google Scholar 

  26. Bonnaerens, M., Freiberger, M., Dambre, J.: Anchor pruning for object detection. Comput. Vis. Image Underst. 221, 1035 (2022). https://doi.org/10.1016/j.cviu.2022.103445

    Article  Google Scholar 

  27. Zhong, Y., Wang, J., Peng, J., Zhang, L.: Anchor box optimization for object detection. Proc. - 2020 IEEE Winter Conf. Appl. Comput. Vision, WACV 2020. 1275–1283 (2020). https://doi.org/10.1109/WACV45572.2020.9093498

  28. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 8693 LNCS, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48/COVER/

  29. COCO - Common objects in context, https://cocodataset.org/#download

  30. Mehta, R., Ozturk, C.: Object detection at 200 frames per second. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 11133 LNCS, 659–675 (2019). https://doi.org/10.1007/978-3-030-11021-5_41

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors have equal contribution in this manuscript.

Corresponding author

Correspondence to Jyotir Moy Chatterjee.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: In this article Author Jyotir Moy Chatterjee affiliation wrongly mention. It has been corrected.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, C., Gill, N.S., Gulia, P. et al. A novel finetuned YOLOv6 transfer learning model for real-time object detection. J Real-Time Image Proc 20, 42 (2023). https://doi.org/10.1007/s11554-023-01299-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01299-3

Keywords

Navigation