Real-time object detection method based on YOLOv5 and efficient mobile network

Feng, Shuai; Qian, Huaming; Wang, Huilin; Wang, Wenna

doi:10.1007/s11554-024-01433-9

Real-time object detection method based on YOLOv5 and efficient mobile network

Research
Published: 24 March 2024

Volume 21, article number 56, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Shuai Feng¹,
Huaming Qian¹^na1,
Huilin Wang¹^na1 &
…
Wenna Wang¹^na1

589 Accesses
Explore all metrics

Abstract

The object detection algorithm YOLOv5, which is based on deep learning, experiences inefficiencies due to an overabundance of model parameters and an overly complex structure. These drawbacks hinder its deployment on mobile devices, which are constrained by their computational capabilities and storage capacities. Addressing these limitations, we introduce a lightweight object detection algorithm that harnesses the coordinate attention (CA) mechanism in synergy with the YOLOv5 framework. Our approach embeds the CA mechanism into MobileNetv2 to create MobileNetv2-CA, thereby replacing the CSDarkNet53 as YOLOv5’s backbone network. This innovation not only trims the model’s parameter count but also maintains a competitive level of accuracy. Further amplifying performance, we propose a multi-scale fast spatial pyramid pooling (MSPPF) layer, designed to expedite and refine the model’s handling of various input image dimensions. Complementing this, we incorporate MPANet, a feature fusion network comprising optimally designed upsampling and downsample modules, along with feature extraction cells. This configuration is devised to elevate detection precision while minimizing the parameter overhead. Empirical results showcase the prowess of our methodology: we achieve a mean average precision (mAP) of 87.6% on the PASCAL VOC07+12 dataset and an average precision (AP) of 39.4% on the MS COCO dataset, with the model’s parameter size being a mere 10.1MB. When compared to the original YOLOv5, our proposed model achieves a parameter reduction of 76.9% and operates at a velocity that is 1.72 times faster, reaching 54.9 frames per second (FPS) on an NVIDIA RTX3060. Versus SOTA techniques, our method demonstrates a commendable equilibrium between accuracy and real-time performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-TUF: An Improved YOLOv5 Model for Small Object Detection

SlimYOLOv4: lightweight object detector based on YOLOv4

Article 10 February 2022

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

Article 21 October 2024

Availability of data and materials

Research data are not shared.

References

Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 2874–2883 (2016)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection (2020). arXiv preprint. arXiv:2004.10934
Cai, L., Zhao, B., Wang, Z., et al.: MaxpoolNMS: getting rid of NMS bottlenecks in two-stage object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Long Beach, CA, USA. pp. 9356–9364 (2019)
Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: Proceedings of European Conference on Computer Vision(ECCV), Glasgow, UK. pp. 213–229 (2020)
Ding, P., Qian, H., Chu, S.: SlimYOLOv4: lightweight object detector based on YOLOv4. J. Real-Time Image Process. 19(3), 487–498 (2022)
Article Google Scholar
Ding, P., Qian, H., Bao, J., et al.: L-YOLOv4: lightweight YOLOv4 based on modified RFB-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Process. 20(4), 71 (2023)
Article Google Scholar
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107, 3–11 (2018)
Article Google Scholar
Ge, Z., Liu, S., Wang, F., et al.: YOLOX: exceeding yolo series in 2021 (2021). arXiv preprint. arXiv:2107.08430
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Santiago, Chile. pp. 1440–1448 (2015)
Han, K., Wang, Y., Tian, Q., et al.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 1580–1589 (2020)
He, K., Zhang, X., Ren, S., et al.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy. pp. 2961–2969 (2017)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, TN, USA. pp. 13713–13722 (2021)
Howard, A., Sandler, M., Chu, G., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), Seoul, Korea (South). pp. 1314–1324 (2019)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint. arXiv:1704.04861
Jocher, G.: YOLOv5 by Ultralytics (2020). https://github.com/ultralytics/yolov5
Li, C., Li, L., Jiang, H., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint. arXiv:2209.02976
Li C, Li L, Geng Y, et al.: YOLOv6 v3.0: a full-scale reloading (2023). arXiv preprint. arXiv:2301.05586
Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Comput. Sci. 8, e1145 (2022)
Article Google Scholar
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector (2017). arXiv preprint. arXiv:1712.00960
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA. pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision(ICCV), Venice, Italy. pp. 2980–2988 (2017)
Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection (2019). arXiv preprint. arXiv:1911.09516
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision(ECCV), Amsterdam, The Netherlands. pp. 21–37 (2016)
Ma, N., Zhang, X., Zheng, H.T., et al.: ShuffleNet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Qi, F., Wang, Y., Tang, Z., et al.: Real-time and effective detection of agricultural pest using an improved YOLOv5 network. J. Real-Time Image Process. 20(2), 33 (2023)
Article Google Scholar
Qian, H., Wang, H., Feng, S., et al.: FESSD: SSD target detection based on feature fusion and feature enhancement. J. Real-Time Image Process. 20(1), 2 (2023)
Article Google Scholar
Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Nashville, TN, USA. pp. 10213–10224 (2021)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Honolulu, HI, USA. pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018). arXiv preprint. arXiv:1804.02767
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA. pp. 4510–4520 (2018)
Shi, W., Caballero, J., Huszár, F., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Las Vegas, NV, USA. pp. 1874–1883 (2016)
Tan, M., Le, Q.V.: MixConv: mixed depthwise convolutional kernels (2019). arXiv preprint. arXiv:1907.09595
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 10781–10790 (2020)
Tang, Y., Han, K., Guo, J., et al.: GhostNetv2: enhance cheap operation with long-range attention (2022). arXiv preprint. arXiv:2211.12905
Wang, C., Tong, X., Gao, R., et al.: Mobile-YOLO: a lightweight and efficient implementation of object detector based on YOLOv4. Adv. Comput. Sci. Eng. Educ. 134, 221–234 (2022)
Wang, C.Y., Liao, H.Y.M., Wu, Y.H., et al.: CSPNet: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPR), Seattle, WA, USA. pp. 390–391 (2020)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE/cvf conference on computer vision and pattern recognition(CVPR), Nashville, TN, USA. pp. 13,029–13,038 (2021)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Vancouver, Canada. pp. 7464–7475 (2023)
Wang, K., Liew, J.H., Zou, Y., et al.: PANet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9197–9206 (2019)
Wang, Q., Wu, B., Zhu, P., et al.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Seattle, WA, USA. pp. 11534–11542 (2020)
Xu, X., Jiang, Y., Chen, W., et al.: DAMO-YOLO: a report on real-time object detection design (2022). arXiv preprint. arXiv:2211.15444
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA. pp. 6848–6856 (2018)
Zhao, Q., Sheng, T., Wang, Y., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI Conference on Artificial Intelligence(AAAI), Honolulu, Hawaii, USA. pp. 9259–9266 (2019)

Download references

Funding

This work was supported by KeyArea Research and Development Program of Guangdong Province under Grant (Funding no. 2020B0909020001) and National Natural Science Foundation of China (Funding no. 61573113).

Author information

Huaming Qian, Huilin Wang, and Wenna Wang have contributed equally to this work.

Authors and Affiliations

College of Intelligent Systems Science and Engineering, Harbin Engineering University, NanTong Street, Harbin, 150001, China
Shuai Feng, Huaming Qian, Huilin Wang & Wenna Wang

Authors

Shuai Feng
View author publications
You can also search for this author inPubMed Google Scholar
Huaming Qian
View author publications
You can also search for this author inPubMed Google Scholar
Huilin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wenna Wang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Shuai Feng wrote the main manuscript text. Wenna Wang,Huilin Wang, and Huaming Qian modified the syntax. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huaming Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethics approval

This declaration is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Feng, S., Qian, H., Wang, H. et al. Real-time object detection method based on YOLOv5 and efficient mobile network. J Real-Time Image Proc 21, 56 (2024). https://doi.org/10.1007/s11554-024-01433-9

Download citation

Received: 15 October 2023
Accepted: 07 February 2024
Published: 24 March 2024
DOI: https://doi.org/10.1007/s11554-024-01433-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time object detection method based on YOLOv5 and efficient mobile network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

YOLO-TUF: An Improved YOLOv5 Model for Small Object Detection

SlimYOLOv4: lightweight object detector based on YOLOv4

Van-DETR: enhanced real-time object detection with vanillanet and advanced feature fusion

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now