Skip to main content
Log in

Real-time object detection method based on YOLOv5 and efficient mobile network

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The object detection algorithm YOLOv5, which is based on deep learning, experiences inefficiencies due to an overabundance of model parameters and an overly complex structure. These drawbacks hinder its deployment on mobile devices, which are constrained by their computational capabilities and storage capacities. Addressing these limitations, we introduce a lightweight object detection algorithm that harnesses the coordinate attention (CA) mechanism in synergy with the YOLOv5 framework. Our approach embeds the CA mechanism into MobileNetv2 to create MobileNetv2-CA, thereby replacing the CSDarkNet53 as YOLOv5’s backbone network. This innovation not only trims the model’s parameter count but also maintains a competitive level of accuracy. Further amplifying performance, we propose a multi-scale fast spatial pyramid pooling (MSPPF) layer, designed to expedite and refine the model’s handling of various input image dimensions. Complementing this, we incorporate MPANet, a feature fusion network comprising optimally designed upsampling and downsample modules, along with feature extraction cells. This configuration is devised to elevate detection precision while minimizing the parameter overhead. Empirical results showcase the prowess of our methodology: we achieve a mean average precision (mAP) of 87.6% on the PASCAL VOC07+12 dataset and an average precision (AP) of 39.4% on the MS COCO dataset, with the model’s parameter size being a mere 10.1MB. When compared to the original YOLOv5, our proposed model achieves a parameter reduction of 76.9% and operates at a velocity that is 1.72 times faster, reaching 54.9 frames per second (FPS) on an NVIDIA RTX3060. Versus SOTA techniques, our method demonstrates a commendable equilibrium between accuracy and real-time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

Research data are not shared.

References

  1. Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 2874–2883 (2016)

  2. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection (2020). arXiv preprint. arXiv:2004.10934

  3. Cai, L., Zhao, B., Wang, Z., et al.: MaxpoolNMS: getting rid of NMS bottlenecks in two-stage object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, CA, USA. pp. 9356–9364 (2019)

  4. Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: Proceedings of European Conference on Computer Vision(ECCV), Glasgow, UK. pp. 213–229 (2020)

  5. Ding, P., Qian, H., Chu, S.: SlimYOLOv4: lightweight object detector based on YOLOv4. J. Real-Time Image Process. 19(3), 487–498 (2022)

    Article  Google Scholar 

  6. Ding, P., Qian, H., Bao, J., et al.: L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Process. 20(4), 71 (2023)

    Article  Google Scholar 

  7. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107, 3–11 (2018)

    Article  Google Scholar 

  8. Ge, Z., Liu, S., Wang, F., et al.: YOLOX: exceeding yolo series in 2021 (2021). arXiv preprint. arXiv:2107.08430

  9. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Santiago, Chile. pp. 1440–1448 (2015)

  10. Han, K., Wang, Y., Tian, Q., et al.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 1580–1589 (2020)

  11. He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  12. He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy. pp. 2961–2969 (2017)

  13. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, TN, USA. pp. 13713–13722 (2021)

  14. Howard, A., Sandler, M., Chu, G., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, Korea (South). pp. 1314–1324 (2019)

  15. Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint. arXiv:1704.04861

  16. Jocher, G.: YOLOv5 by Ultralytics (2020). https://github.com/ultralytics/yolov5

  17. Li, C., Li, L., Jiang, H., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint. arXiv:2209.02976

  18. Li C, Li L, Geng Y, et al.: YOLOv6 v3.0: a full-scale reloading (2023). arXiv preprint. arXiv:2301.05586

  19. Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Comput. Sci. 8, e1145 (2022)

    Article  Google Scholar 

  20. Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017). arXiv preprint. arXiv:1712.00960

  21. Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA. pp. 2117–2125 (2017)

  22. Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy. pp. 2980–2988 (2017)

  23. Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection (2019). arXiv preprint. arXiv:1911.09516

  24. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision(ECCV), Amsterdam, The Netherlands. pp. 21–37 (2016)

  25. Ma, N., Zhang, X., Zheng, H.T., et al.: ShuffleNet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)

  26. Qi, F., Wang, Y., Tang, Z., et al.: Real-time and effective detection of agricultural pest using an improved YOLOv5 network. J. Real-Time Image Process. 20(2), 33 (2023)

    Article  Google Scholar 

  27. Qian, H., Wang, H., Feng, S., et al.: FESSD: SSD target detection based on feature fusion and feature enhancement. J. Real-Time Image Process. 20(1), 2 (2023)

    Article  Google Scholar 

  28. Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, TN, USA. pp. 10213–10224 (2021)

  29. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA. pp. 7263–7271 (2017)

  30. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint. arXiv:1804.02767

  31. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 779–788 (2016)

  32. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  33. Sandler, M., Howard, A., Zhu, M., et al.: MobileNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA. pp. 4510–4520 (2018)

  34. Shi, W., Caballero, J., Huszár, F., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 1874–1883 (2016)

  35. Tan, M., Le, Q.V.: MixConv: mixed depthwise convolutional kernels (2019). arXiv preprint. arXiv:1907.09595

  36. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 10781–10790 (2020)

  37. Tang, Y., Han, K., Guo, J., et al.: GhostNetv2: enhance cheap operation with long-range attention (2022). arXiv preprint. arXiv:2211.12905

  38. Wang, C., Tong, X., Gao, R., et al.: Mobile-YOLO: a lightweight and efficient implementation of object detector based on YOLOv4. Adv. Comput. Sci. Eng. Educ. 134, 221–234 (2022)

  39. Wang, C.Y., Liao, H.Y.M., Wu, Y.H., et al.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPR), Seattle, WA, USA. pp. 390–391 (2020)

  40. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition(CVPR), Nashville, TN, USA. pp. 13,029–13,038 (2021)

  41. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Vancouver, Canada. pp. 7464–7475 (2023)

  42. Wang, K., Liew, J.H., Zou, Y., et al.: PANet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9197–9206 (2019)

  43. Wang, Q., Wu, B., Zhu, P., et al.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 11534–11542 (2020)

  44. Xu, X., Jiang, Y., Chen, W., et al.: DAMO-YOLO: a report on real-time object detection design (2022). arXiv preprint. arXiv:2211.15444

  45. Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA. pp. 6848–6856 (2018)

  46. Zhao, Q., Sheng, T., Wang, Y., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence(AAAI), Honolulu, Hawaii, USA. pp. 9259–9266 (2019)

Download references

Funding

This work was supported by KeyArea Research and Development Program of Guangdong Province under Grant (Funding no. 2020B0909020001) and National Natural Science Foundation of China (Funding no. 61573113).

Author information

Authors and Affiliations

Authors

Contributions

Shuai Feng wrote the main manuscript text. Wenna Wang,Huilin Wang, and Huaming Qian modified the syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Feng, S., Qian, H., Wang, H. et al. Real-time object detection method based on YOLOv5 and efficient mobile network. J Real-Time Image Proc 21, 56 (2024). https://doi.org/10.1007/s11554-024-01433-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01433-9

Keywords

Navigation