An efficient and lightweight small target detection framework for vision-based autonomous road cleaning

Hu, Cheng; Ni, Mengyao; Cao, Danhua

doi:10.1007/s11042-024-18585-2

An efficient and lightweight small target detection framework for vision-based autonomous road cleaning

Published: 25 March 2024

Volume 83, pages 88587–88612, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cheng Hu¹,
Mengyao Ni¹ &
Danhua Cao¹

138 Accesses
1 Altmetric
Explore all metrics

Abstract

The development of machine vision technology provides a feasible way for intelligent vehicles used for road cleaning to achieve automation, improve cleaning efficiency, and reduce energy consumption. This paper proposes an efficient and lightweight small target detection framework, in which a cascade model is formed by cascading a refined model for category correction to the back-end of a general model for garbage detection to perform the high-precision detection of garbage in road scenes, and a road segmentation model is proposed to determine the operating area of the cleaning vehicle. For the road segmentation model, a category-based loss function is proposed to improve the recall rate of difficult categories such as sidewalk and foreground. Besides, we reparameterize the network structure in the inference phase of the model, thus achieving a lightweight model to improve computational efficiency and reduce the need for the use of computational resources. The experimental results show that the proposed road segmentation model achieves a good trade-off between accuracy and speed compared to the state-of-the-art model. For the cascade model, it can realize the category correction of some targeted garbage with low confidence and optimize the detection effect of road garbage. The experimental results demonstrate that compared with using YOLOv5s alone, the average recall rate of our proposed cascade model for road garbage is improved from 77.8% to 81.4%, the average accuracy rate is improved from 69.5% to 83.1%, and the detection performance for small-sized garbage is also better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A similarity-guided segmentation model for garbage detection under road scene

Article Open access 12 May 2022

UR-YOLO: an urban road small object detection algorithm

Article 24 September 2024

An improved YOLOv8 algorithm for small object detection in autonomous driving

Article 25 July 2024

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Min H, Zhu X, Yan B, Yu B (2019) Research on visual algorithm of road garbage based on intelligent control of road sweeper. J Phys: Conf Ser 1302(3):032024
Google Scholar
Deng J, Xuan X, Wang W, Li Z, Yao H, Wang Z (2020) A review of research on object detection based on deep learning. J Phys: Conf Ser 1684(1):012028
Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp 2980–2988. https://arxiv.org/abs/1703.06870
Gavrilescu R, Zet C, Foșalău C, Skoczylas M, Cotovanu D (2018) Faster R-CNN: an approach to realtime object detection. In: 2018 international conference and exposition on electrical and power engineering (EPE), Iasi, Romania, pp 165–168. https://arxiv.org/abs/1506.01497
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, IEEE, pp 2117–2125. https://arxiv.org/abs/1612.03144
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 779–788. https://arxiv.org/abs/1506.02640
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 7263–7271. https://arxiv.org/abs/1612.08242
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Pattern Anal 15:1125–1131
Google Scholar
Bochkovskiy A, Wang CY, Liao H Y M (2020) YOLOv4: optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934
Tsung-Yi L, Michael M, Serge B, Lubomir B, Ross G, James H, Pietro P, Deva R, Lawrence CZ, Dollár P (2015) Microsoft COCO: common objects in context. arXiv:1405.0312. https://arxiv.org/abs/1405.0312v3. Accessed 21 Feb 2015
Ghosh S, Das N, Das I, Maulik U (2019) Understanding deep learning techniques for image segmentation. ACM Computing Surveys(CSUR) 52(4):1–35
Google Scholar
Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362–386
Article Google Scholar
Karaduman M, Cınar A, Eren H (2019) UAV traffic patrolling via road detection and tracking in anonymous aerial video frames. J Intell Rob Syst 95:675–690
Article Google Scholar
Chen L, Papandreou G, Schroff F, Adam H (2018) Rethinking atrous convolution for semantic image segmentation. In: 15th european conference on computer vision (ECCV), Munich, Germany. Springer international publishing, pp 833–851. https://arxiv.org/abs/1706.05587
Yu HS, Yang ZG, Tan L et al (2018) Methods and datasets on semantic segmentation: A review. Neurocomputing 304:82–103
Article Google Scholar
Zeng D, Zhang S, Chen F, Wang Y (2019) Multi-scale CNN based garbage detection of airborne hyperspectral data. IEEE Access 7:104514–104527
Article Google Scholar
Wang T, Cai Y, Liang L, Ye D (2020) A multi-level approach to waste object segmentation. Sensors 20(14):3816
Article Google Scholar
Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3640–3649. https://arxiv.org/abs/1511.03339
Romera E, Alvarez JM, Bergasa LM et al (2017) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Article Google Scholar
Miao Y, Zhang S, He S (2022) Real-time detection network SI-SSD for weak targets in complex traffic scenarios. Neural Process Lett 54(4):3235–3247
Article Google Scholar
Ju M, Luo J, Zhang P, He M, Luo H (2019) A Simple and Efficient Network for Small Target Detection. IEEE Access 7:85771–85781
Article Google Scholar
Cui Y, Yang L, Liu D (2022) Dynamic proposals for efficient object detection. arXiv preprint arXiv:2207.05252. https://arxiv.org/abs/2207.05252. Accessed 12 Jul 2022
Cui Y (2022) DFA: dynamic feature aggregation for efficient video object detection. arXiv preprint arXiv:2210.00588. https://arxiv.org/abs/2210.00588. Accessed 2 Oct 2022
Ju M, Luo J, Liu G, Luo H (2021) ISTDet: An efficient end-to-end neural network for infrared small target detection. Infrared Phys Technol 114:103659
Article Google Scholar
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of international conference on medical image computing and computer-assisted intervention (MICCAI), Munich, Germany. Springer international publishing, pp 234–241. https://arxiv.org/abs/1505.04597
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. https://arxiv.org/abs/1511.07122. Accessed 30 Apr 2016
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 2881–2890. https://arxiv.org/abs/1612.01105
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA. IEEE, pp 4510–4520. https://arxiv.org/abs/1801.04381
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: IEEE conference on computer vision and pattern recognition (CVPR), virtual. IEEE, pp 13733–13742. https://arxiv.org/abs/2101.03697
Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: deep learning in medical image analysis and multimodal learning for clinical decision support: third international workshop, DLMIA 2017, and 7th international workshop, ML-CDS 2017, Held in conjunction with MICCAI 2017, Québec City, QC, Canada. Proceedings 3, pp 240–248. https://arxiv.org/abs/1707.03237
Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell (2):318–327. https://arxiv.org/abs/1708.02002
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. Adv Neural Inf Process Syst 33:15288–15299
Google Scholar
Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 1874–1883. https://arxiv.org/abs/1609.05158
Cordts M, Omran M, Ramos S, Rehfeld T., Enzweiler M., Benenson R et al (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3213–3223. https://arxiv.org/abs/1604.01685
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: advances in neural information processing systems 32 (NIPS), Vancouver, Canada. Curran Associates, Inc., pp 8024–8035. https://arxiv.org/abs/1912.01703
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 770–778. https://arxiv.org/abs/1512.03385
Xu J, Xiong Z, Bhattacharyya SP (2023) PIDNet: a real-time semantic segmentation network inspired by PID controllers. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada. IEEE, pp 19529–19539. https://arxiv.org/abs/2206.02066
Yang C, Huang Z, Wang N (2021) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA. IEEE, pp 13658–13667. https://arxiv.org/abs/2103.09136
Tang S, Zhang S, Fang Y(2023) HIC-YOLOv5: improved yolov5 for small object detection. arXiv:2309.16393. https://arxiv.org/abs/1405.0312v3. Accessed 11 Jan 2024

Download references

Acknowledgements

We gratefully acknowledge the support from the Science and Technology Department of Hubei Province for 2019AAA057 project and the support from the State Grid Information and Telecommunication Branch which is 5700-202325308A-1-1-ZN.

Author information

Authors and Affiliations

School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
Cheng Hu, Mengyao Ni & Danhua Cao

Authors

Cheng Hu
View author publications
You can also search for this author inPubMed Google Scholar
Mengyao Ni
View author publications
You can also search for this author inPubMed Google Scholar
Danhua Cao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Danhua Cao.

Ethics declarations

Ethics approval

Not applicable.

Conflict of interest

The authors declare that they have no conflicts of interest to this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hu, C., Ni, M. & Cao, D. An efficient and lightweight small target detection framework for vision-based autonomous road cleaning. Multimed Tools Appl 83, 88587–88612 (2024). https://doi.org/10.1007/s11042-024-18585-2

Download citation

Received: 20 July 2023
Revised: 29 January 2024
Accepted: 06 February 2024
Published: 25 March 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s11042-024-18585-2

Keywords

Part of a collection:

Track 6: Computer Vision for Multimedia Applications

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient and lightweight small target detection framework for vision-based autonomous road cleaning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A similarity-guided segmentation model for garbage detection under road scene

UR-YOLO: an urban road small object detection algorithm

An improved YOLOv8 algorithm for small object detection in autonomous driving

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now