Abstract
Floating debris is a prominent indicator in measuring water quality. However, traditional object detection algorithms cannot meet the requirement of high accuracy due to the complexity of the environment. It is difficult for some deep learning-based object detection algorithms to achieve fast detection due to the limited performance of embedded devices. To address the above issues, this paper proposes TC-YOLOv5, which improves the detection accuracy by integrating the convolutional block attention module and vision transformer. To ensure the efficiency and low resource consumption of the algorithm, we selectively remove some convolutional layers and reduce some redundant calculations. We evaluated the performance of TC-YOLOv5 on a dataset with multiple species of floating debris, which can process an image in an average of 1.18 s on a Raspberry Pi 4B and achieve the mean average precision (mAP@0.5) of 84.2%. The detection accuracy, speed, and floating-point operations (FLOPs) of TC-YOLOv5 are better than some algorithms of the YOLOv5 series, such as YOLOv5n, YOLOv5s, and YOLOv5m. The above data show that TC-YOLOv5 realizes high-precision, low resource consumption, and rapid detection.
Similar content being viewed by others
References
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection (2020). https://doi.org/10.48550/arXiv.2004.10934
Cheng, Y., Zhu, J., Jiang, M., Fu, J., Pang, C., Wang, P., Sankaran, K., Onabola, O., Liu, Y., Liu, D., Bengio, Y.: FloW: A Dataset and Benchmark for Floating Waste Detection in Inland Waters. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10933–10942 (2021). https://doi.org/10.1109/ICCV48922.2021.01077
Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR), pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://doi.org/10.48550/arXiv.2010.11929
Feng, D., Haase-Schutz, C., Rosenbaum, L., Hertlein, H., Glaser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 22, 1341–1360 (2021). https://doi.org/10.1109/TITS.2020.2972974
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Girshick, R.: Fast R-CNN, in: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Glenn, J.: yolov5. Git code (2020). https://github.com/ultralytics/yolov5
Guo, M.-H., Xu, T.-X., Liu, J.-J., Liu, Z.-N., Jiang, P.-T., Mu, T.-J., Zhang, S.-H., Martin, R.R., Cheng, M.-M., Hu, S.-M.: Attentionmechanisms in computer vision: a survey. Comput. Vis. Media 8, 331–368 (2022). https://doi.org/10.1007/s41095-022-0271-y
He, K., Zhang, X., Ren, S., Sun, J.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (Eds.), Computer Vision – ECCV2014, pp. 346–361 (2014). https://doi.org/10.1007/978-3-319-10578-9_23
He K, Zhang X, Ren S, Sun J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, pp 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Their Appl. 13, 18–28 (1998). https://doi.org/10.1109/5254.708428
J. Hu, L. Shen and G. Sun.: Squeeze-and-Excitation Networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Latif SA, Khairuddin U, Khairuddin ASM.: Development of Machine Vision System for Riverine Debris Counting. In: 2021 6th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE). IEEE, Kedah, Malaysia, pp. 1–6 (2021). https://doi.org/10.1109/ICRAIE52900.2021.9704016
Li, S.-A., Lin, Y.-C., Weng, C.-W., Chen, Y.-H., Lo, C.-H., Yang, M.-H., Hsieh, M.-H., Wong, C.-C.: Circle object recognition based on monocular vision for home security robot. In: 2012 International Symposium on Intelligent Signal Processing and Communications Systems, pp. 258–261 (2012). https://doi.org/10.1109/ISPACS.2012.6473491
Li, X., Tian, M., Kong, S., Wu, L., Yu, J.: A modified YOLOv3 detection method for vision-based water surface garbage capture robot. Int. J. Adv. Robot. Syst. 17, 172988142093271 (2020). https://doi.org/10.1177/1729881420932715
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path Aggregation Network for Instance Segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: Single Shot MultiBox Detector (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, pp. 1150–1157 vol.2 (1999). https://doi.org/10.1109/ICCV.1999.790410
Paller G, Élő G.: Towards a Floating Plastic Waste Early Warning System. In: Proceedings of the 11th International Conference on Sensor Networks. SCITEPRESS - Science and Technology Publications, Online Streaming, pp. 45–50(2022). https://doi.org/10.5220/0010894500003118
Panwar, N.L., Kaushik, S.C., Kothari, S.: Role of renewable energy sources in environmental protection: a review. Renew. Sustain. Energy Rev. 15, 1513–1524 (2011). https://doi.org/10.1016/j.rser.2010.11.037
Park J, Woo S, Lee J-Y, Kweon IS.: BAM: Bottleneck Attention Module (2018). https://doi.org/10.48550/arxiv.1807.06514
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement (2018). https://doi.org/10.48550/arXiv.1804.02767
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Wang, C.-Y., Mark Liao, H.-Y., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., Yeh, I.-H.: CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571–1580 (2020). https://doi.org/10.1109/CVPRW50498.2020.00203
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q.: ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, pp 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: Convolutional Block Attention Module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018, pp. 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Xi, P., Guan, H., Shu, C., Borgeat, L., Goubran, R.: An integrated approach for medical abnormality detection using deep patch convolutional neural networks. Vis Comput 36, 1869–1882 (2020). https://doi.org/10.1007/s00371-019-01775-7
Zailan, N.A., Mohd Khairuddin, A.S., Khairuddin, U., Taguchi, A.: YOLO-based Network Fusion for Riverine Floating Debris Monitoring System. In: 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), pp. 1–5 (2021). https://doi.org/10.1109/ICECCE52056.2021.9514096
Zhu X, Lyu S, Wang X, Zhao Q.: TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, Montreal, BC, Canada, pp 2778–2788 (2021). https://doi.org/10.1109/ICCVW54120.2021.00312
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Liu, S., Cai, Z. et al. TC-YOLOv5: rapid detection of floating debris on raspberry Pi 4B. J Real-Time Image Proc 20, 17 (2023). https://doi.org/10.1007/s11554-023-01265-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01265-z