Abstract
The development of machine vision technology provides a feasible way for intelligent vehicles used for road cleaning to achieve automation, improve cleaning efficiency, and reduce energy consumption. This paper proposes an efficient and lightweight small target detection framework, in which a cascade model is formed by cascading a refined model for category correction to the back-end of a general model for garbage detection to perform the high-precision detection of garbage in road scenes, and a road segmentation model is proposed to determine the operating area of the cleaning vehicle. For the road segmentation model, a category-based loss function is proposed to improve the recall rate of difficult categories such as sidewalk and foreground. Besides, we reparameterize the network structure in the inference phase of the model, thus achieving a lightweight model to improve computational efficiency and reduce the need for the use of computational resources. The experimental results show that the proposed road segmentation model achieves a good trade-off between accuracy and speed compared to the state-of-the-art model. For the cascade model, it can realize the category correction of some targeted garbage with low confidence and optimize the detection effect of road garbage. The experimental results demonstrate that compared with using YOLOv5s alone, the average recall rate of our proposed cascade model for road garbage is improved from 77.8% to 81.4%, the average accuracy rate is improved from 69.5% to 83.1%, and the detection performance for small-sized garbage is also better.












Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Min H, Zhu X, Yan B, Yu B (2019) Research on visual algorithm of road garbage based on intelligent control of road sweeper. J Phys: Conf Ser 1302(3):032024
Deng J, Xuan X, Wang W, Li Z, Yao H, Wang Z (2020) A review of research on object detection based on deep learning. J Phys: Conf Ser 1684(1):012028
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp 2980–2988. https://arxiv.org/abs/1703.06870
Gavrilescu R, Zet C, Foșalău C, Skoczylas M, Cotovanu D (2018) Faster R-CNN: an approach to realtime object detection. In: 2018 international conference and exposition on electrical and power engineering (EPE), Iasi, Romania, pp 165–168. https://arxiv.org/abs/1506.01497
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, IEEE, pp 2117–2125. https://arxiv.org/abs/1612.03144
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 779–788. https://arxiv.org/abs/1506.02640
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 7263–7271. https://arxiv.org/abs/1612.08242
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Pattern Anal 15:1125–1131
Bochkovskiy A, Wang CY, Liao H Y M (2020) YOLOv4: optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934
Tsung-Yi L, Michael M, Serge B, Lubomir B, Ross G, James H, Pietro P, Deva R, Lawrence CZ, Dollár P (2015) Microsoft COCO: common objects in context. arXiv:1405.0312. https://arxiv.org/abs/1405.0312v3. Accessed 21 Feb 2015
Ghosh S, Das N, Das I, Maulik U (2019) Understanding deep learning techniques for image segmentation. ACM Computing Surveys(CSUR) 52(4):1–35
Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362–386
Karaduman M, Cınar A, Eren H (2019) UAV traffic patrolling via road detection and tracking in anonymous aerial video frames. J Intell Rob Syst 95:675–690
Chen L, Papandreou G, Schroff F, Adam H (2018) Rethinking atrous convolution for semantic image segmentation. In: 15th european conference on computer vision (ECCV), Munich, Germany. Springer international publishing, pp 833–851. https://arxiv.org/abs/1706.05587
Yu HS, Yang ZG, Tan L et al (2018) Methods and datasets on semantic segmentation: A review. Neurocomputing 304:82–103
Zeng D, Zhang S, Chen F, Wang Y (2019) Multi-scale CNN based garbage detection of airborne hyperspectral data. IEEE Access 7:104514–104527
Wang T, Cai Y, Liang L, Ye D (2020) A multi-level approach to waste object segmentation. Sensors 20(14):3816
Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3640–3649. https://arxiv.org/abs/1511.03339
Romera E, Alvarez JM, Bergasa LM et al (2017) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Miao Y, Zhang S, He S (2022) Real-time detection network SI-SSD for weak targets in complex traffic scenarios. Neural Process Lett 54(4):3235–3247
Ju M, Luo J, Zhang P, He M, Luo H (2019) A Simple and Efficient Network for Small Target Detection. IEEE Access 7:85771–85781
Cui Y, Yang L, Liu D (2022) Dynamic proposals for efficient object detection. arXiv preprint arXiv:2207.05252. https://arxiv.org/abs/2207.05252. Accessed 12 Jul 2022
Cui Y (2022) DFA: dynamic feature aggregation for efficient video object detection. arXiv preprint arXiv:2210.00588. https://arxiv.org/abs/2210.00588. Accessed 2 Oct 2022
Ju M, Luo J, Liu G, Luo H (2021) ISTDet: An efficient end-to-end neural network for infrared small target detection. Infrared Phys Technol 114:103659
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of international conference on medical image computing and computer-assisted intervention (MICCAI), Munich, Germany. Springer international publishing, pp 234–241. https://arxiv.org/abs/1505.04597
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. https://arxiv.org/abs/1511.07122. Accessed 30 Apr 2016
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 2881–2890. https://arxiv.org/abs/1612.01105
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA. IEEE, pp 4510–4520. https://arxiv.org/abs/1801.04381
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: IEEE conference on computer vision and pattern recognition (CVPR), virtual. IEEE, pp 13733–13742. https://arxiv.org/abs/2101.03697
Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: deep learning in medical image analysis and multimodal learning for clinical decision support: third international workshop, DLMIA 2017, and 7th international workshop, ML-CDS 2017, Held in conjunction with MICCAI 2017, Québec City, QC, Canada. Proceedings 3, pp 240–248. https://arxiv.org/abs/1707.03237
Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell (2):318–327. https://arxiv.org/abs/1708.02002
Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. Adv Neural Inf Process Syst 33:15288–15299
Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 1874–1883. https://arxiv.org/abs/1609.05158
Cordts M, Omran M, Ramos S, Rehfeld T., Enzweiler M., Benenson R et al (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3213–3223. https://arxiv.org/abs/1604.01685
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: advances in neural information processing systems 32 (NIPS), Vancouver, Canada. Curran Associates, Inc., pp 8024–8035. https://arxiv.org/abs/1912.01703
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 770–778. https://arxiv.org/abs/1512.03385
Xu J, Xiong Z, Bhattacharyya SP (2023) PIDNet: a real-time semantic segmentation network inspired by PID controllers. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada. IEEE, pp 19529–19539. https://arxiv.org/abs/2206.02066
Yang C, Huang Z, Wang N (2021) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA. IEEE, pp 13658–13667. https://arxiv.org/abs/2103.09136
Tang S, Zhang S, Fang Y(2023) HIC-YOLOv5: improved yolov5 for small object detection. arXiv:2309.16393. https://arxiv.org/abs/1405.0312v3. Accessed 11 Jan 2024
Acknowledgements
We gratefully acknowledge the support from the Science and Technology Department of Hubei Province for 2019AAA057 project and the support from the State Grid Information and Telecommunication Branch which is 5700-202325308A-1-1-ZN.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Conflict of interest
The authors declare that they have no conflicts of interest to this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, C., Ni, M. & Cao, D. An efficient and lightweight small target detection framework for vision-based autonomous road cleaning. Multimed Tools Appl 83, 88587–88612 (2024). https://doi.org/10.1007/s11042-024-18585-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18585-2