Skip to main content
Log in

An efficient and lightweight small target detection framework for vision-based autonomous road cleaning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The development of machine vision technology provides a feasible way for intelligent vehicles used for road cleaning to achieve automation, improve cleaning efficiency, and reduce energy consumption. This paper proposes an efficient and lightweight small target detection framework, in which a cascade model is formed by cascading a refined model for category correction to the back-end of a general model for garbage detection to perform the high-precision detection of garbage in road scenes, and a road segmentation model is proposed to determine the operating area of the cleaning vehicle. For the road segmentation model, a category-based loss function is proposed to improve the recall rate of difficult categories such as sidewalk and foreground. Besides, we reparameterize the network structure in the inference phase of the model, thus achieving a lightweight model to improve computational efficiency and reduce the need for the use of computational resources. The experimental results show that the proposed road segmentation model achieves a good trade-off between accuracy and speed compared to the state-of-the-art model. For the cascade model, it can realize the category correction of some targeted garbage with low confidence and optimize the detection effect of road garbage. The experimental results demonstrate that compared with using YOLOv5s alone, the average recall rate of our proposed cascade model for road garbage is improved from 77.8% to 81.4%, the average accuracy rate is improved from 69.5% to 83.1%, and the detection performance for small-sized garbage is also better.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Min H, Zhu X, Yan B, Yu B (2019) Research on visual algorithm of road garbage based on intelligent control of road sweeper. J Phys: Conf Ser 1302(3):032024

    Google Scholar 

  2. Deng J, Xuan X, Wang W, Li Z, Yao H, Wang Z (2020) A review of research on object detection based on deep learning. J Phys: Conf Ser 1684(1):012028

    Google Scholar 

  3. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy,  pp 2980–2988. https://arxiv.org/abs/1703.06870

  4. Gavrilescu R, Zet C, Foșalău C, Skoczylas M, Cotovanu D (2018) Faster R-CNN: an approach to realtime object detection. In: 2018 international conference and exposition on electrical and power engineering (EPE), Iasi, Romania, pp 165–168. https://arxiv.org/abs/1506.01497

  5. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, IEEE, pp 2117–2125. https://arxiv.org/abs/1612.03144

  6. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 779–788. https://arxiv.org/abs/1506.02640

  7. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 7263–7271. https://arxiv.org/abs/1612.08242

  8. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. Pattern Anal 15:1125–1131

    Google Scholar 

  9. Bochkovskiy A, Wang CY, Liao H Y M (2020) YOLOv4: optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934

  10. Tsung-Yi L, Michael M, Serge B, Lubomir B, Ross G, James H, Pietro P, Deva R, Lawrence CZ, Dollár P (2015) Microsoft COCO: common objects in context. arXiv:1405.0312. https://arxiv.org/abs/1405.0312v3. Accessed 21 Feb 2015

  11. Ghosh S, Das N, Das I, Maulik U (2019) Understanding deep learning techniques for image segmentation. ACM Computing Surveys(CSUR) 52(4):1–35

    Google Scholar 

  12. Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362–386

    Article  Google Scholar 

  13. Karaduman M, Cınar A, Eren H (2019) UAV traffic patrolling via road detection and tracking in anonymous aerial video frames. J Intell Rob Syst 95:675–690

    Article  Google Scholar 

  14. Chen L, Papandreou G, Schroff F, Adam H (2018) Rethinking atrous convolution for semantic image segmentation. In: 15th european conference on computer vision (ECCV), Munich, Germany. Springer international publishing, pp 833–851. https://arxiv.org/abs/1706.05587

  15. Yu HS, Yang ZG, Tan L et al (2018) Methods and datasets on semantic segmentation: A review. Neurocomputing 304:82–103

    Article  Google Scholar 

  16. Zeng D, Zhang S, Chen F, Wang Y (2019) Multi-scale CNN based garbage detection of airborne hyperspectral data. IEEE Access 7:104514–104527

    Article  Google Scholar 

  17. Wang T, Cai Y, Liang L, Ye D (2020) A multi-level approach to waste object segmentation. Sensors 20(14):3816

    Article  Google Scholar 

  18. Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3640–3649. https://arxiv.org/abs/1511.03339

  19. Romera E, Alvarez JM, Bergasa LM et al (2017) Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272

    Article  Google Scholar 

  20. Miao Y, Zhang S, He S (2022) Real-time detection network SI-SSD for weak targets in complex traffic scenarios. Neural Process Lett 54(4):3235–3247

    Article  Google Scholar 

  21. Ju M, Luo J, Zhang P, He M, Luo H (2019) A Simple and Efficient Network for Small Target Detection. IEEE Access 7:85771–85781

    Article  Google Scholar 

  22. Cui Y, Yang L, Liu D (2022) Dynamic proposals for efficient object detection. arXiv preprint arXiv:2207.05252. https://arxiv.org/abs/2207.05252. Accessed 12 Jul 2022

  23. Cui Y (2022) DFA: dynamic feature aggregation for efficient video object detection. arXiv preprint arXiv:2210.00588. https://arxiv.org/abs/2210.00588. Accessed 2 Oct 2022

  24. Ju M, Luo J, Liu G, Luo H (2021) ISTDet: An efficient end-to-end neural network for infrared small target detection. Infrared Phys Technol 114:103659

    Article  Google Scholar 

  25. Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Article  Google Scholar 

  26. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of international conference on medical image computing and computer-assisted intervention (MICCAI), Munich, Germany. Springer international publishing, pp 234–241. https://arxiv.org/abs/1505.04597

  27. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. https://arxiv.org/abs/1511.07122. Accessed 30 Apr 2016

  28. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA. IEEE, pp 2881–2890. https://arxiv.org/abs/1612.01105

  29. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA. IEEE, pp 4510–4520. https://arxiv.org/abs/1801.04381

  30. Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: making vgg-style convnets great again. In: IEEE conference on computer vision and pattern recognition (CVPR), virtual. IEEE, pp 13733–13742. https://arxiv.org/abs/2101.03697

  31. Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: deep learning in medical image analysis and multimodal learning for clinical decision support: third international workshop, DLMIA 2017, and 7th international workshop, ML-CDS 2017, Held in conjunction with MICCAI 2017, Québec City, QC, Canada. Proceedings 3, pp 240–248. https://arxiv.org/abs/1707.03237

  32. Lin T Y, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell (2):318–327. https://arxiv.org/abs/1708.02002

  33. Mukhoti J, Kulharia V, Sanyal A, Golodetz S, Torr P, Dokania P (2020) Calibrating deep neural networks using focal loss. Adv Neural Inf Process Syst 33:15288–15299

    Google Scholar 

  34. Shi W, Caballero J, Huszar F, Totz J, Aitken AP, Bishop R et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 1874–1883. https://arxiv.org/abs/1609.05158

  35. Cordts M, Omran M, Ramos S, Rehfeld T., Enzweiler M., Benenson R et al (2016) The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 3213–3223. https://arxiv.org/abs/1604.01685

  36. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al (2019) PyTorch: an imperative style, high-performance deep learning library. In: advances in neural information processing systems 32 (NIPS), Vancouver, Canada. Curran Associates, Inc., pp 8024–8035. https://arxiv.org/abs/1912.01703

  37. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA. IEEE, pp 770–778. https://arxiv.org/abs/1512.03385

  38. Xu J, Xiong Z, Bhattacharyya SP (2023) PIDNet: a real-time semantic segmentation network inspired by PID controllers. In: 2023 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Vancouver, BC, Canada. IEEE, pp 19529–19539. https://arxiv.org/abs/2206.02066

  39. Yang C, Huang Z, Wang N (2021) QueryDet: cascaded sparse query for accelerating high-resolution small object detection. In: 2022 IEEE/CVF conference on computer vision and pattern recognition (CVPR), New Orleans, LA, USA. IEEE, pp 13658–13667. https://arxiv.org/abs/2103.09136

  40. Tang S, Zhang S, Fang Y(2023) HIC-YOLOv5: improved yolov5 for small object detection. arXiv:2309.16393. https://arxiv.org/abs/1405.0312v3. Accessed 11 Jan 2024

Download references

Acknowledgements

We gratefully acknowledge the support from the Science and Technology Department of Hubei Province for 2019AAA057 project and the support from the State Grid Information and Telecommunication Branch which is 5700-202325308A-1-1-ZN.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danhua Cao.

Ethics declarations

Ethics approval

Not applicable.

Conflict of interest

The authors declare that they have no conflicts of interest to this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, C., Ni, M. & Cao, D. An efficient and lightweight small target detection framework for vision-based autonomous road cleaning. Multimed Tools Appl 83, 88587–88612 (2024). https://doi.org/10.1007/s11042-024-18585-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18585-2

Keywords