skip to main content
10.1145/3647649.3647712acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicigpConference Proceedingsconference-collections
research-article

Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes

Published:03 May 2024Publication History

ABSTRACT

Live video platforms have attracted many active streamers and daily users, and quickly understanding live video streaming scenes is crucial for ensuring the clean and healthy cyberspace. Video logo often appear in live video and can serve as key clues for understanding video streaming scenes. Due to the presence of the jitter scenes, as well as blur and deformed logos in live streaming scenes, we propose a spatial-channel specific snake-yolov8 for video logo detection in live streaming scenes. First, we design an involutional fusion layer to fuse interframe logo features for jitter scenes. Then, the spatial-channel specific involution is proposed to extract spatial and channel context, to adapt to blurring logos during logo detection. Finally, to cope with deformable logos in live steaming scenes, we embed snake deformation convolution into spatial-channel specific involution. Experimental results show that under inference efficiency of 39.8 FPS, the mAP of the proposed method reaches 69.7% on LogoDet-3K, and 53.0% on self-built BJUT-VLD, which has certain effectiveness and superiority for video logo detection in live streaming scenes.

References

  1. F. Utaminingrum, R. P. Prasetya, and R. Rizdania. 2020. Combining multiple feature for robust traffic sign detection. Journal of Image and Graphics. 8, 2. (June 2020), 53–58. https://doi.org/10.18178/joig.8.2.53-58Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Khan, T. F. Raisa, and R. Debnath. 2018. An efficient contour based fine-grained algorithm for multi category object detection. Journal of Image and Graphics. 6, 2. (December 2018), 127–136. https://doi.org/10.18178/joig.6.2.127-136Google ScholarGoogle ScholarCross RefCross Ref
  3. S. C. H. Hoi, X. Wu, H. Liu, Y. Wu, H. Wang, H. Xue, and Q. Wu. 2015. Logo-Net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 46, 5. (November 2015), 2403–2412. https://doi.org/10.1109/TPAMI.2015.24621511Google ScholarGoogle ScholarCross RefCross Ref
  4. J. R. R. Uijlings, K. E. A. v. d. Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision. (April 2013), https://doi.org/10.1007/s11263-013-0620-5Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Leng. 2021. A gradient balancing approach for robust logo detection. Proceedings of the ACM International Conference on Multimedia (MM'21), Virtual, 4765–4769. https://doi.org/10.1145/3474085.3479201Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. P. Zhang, D. M. Zhang, J. Zhang, C. N. Wang, L. D. Wang, and X. Q. Zou. 2022. TV logo detection and recognition based on data synthesis and metric learning. Journal of Software. 33, 9. (September 2022), 3180‒3194. http://doi.org/10.13328/j.cnki.jos.006619Google ScholarGoogle ScholarCross RefCross Ref
  7. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'16), Las Vegas, USA, 779‒788. https://doi.org/10.1109/CVPR.2016.91Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Huang, Y. Wang, and P. Su. 2016. A new synthetical method of feature enhancement and detection for SAR image targets. Journal of Image and Graphics. 4, 2. (December 2016), 73–77. https://doi.org/10.18178/joig.4.2.73-77Google ScholarGoogle ScholarCross RefCross Ref
  9. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. European Conference on Computer Vision (ECCV'16), Amsterdam, The Netherlands, 21‒37. https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarGoogle ScholarCross RefCross Ref
  10. O. Orti, R. Tous, M. Gomez, J. Poveda, L. Cruz, and O. Wust. 2019. Real-time logo detection in brand-related social media images. International Work-Conference on Artificial Neural Networks (WANN'19), Gran Canaria, Spain, 125‒136. https://doi.org/10.1007/978-3-030-20518-8_11Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Zhang, L. Chen, C. Bo, and S. Yang. 2021. Multi-scale vehicle logo detector. Mobile Networks Applications. 26, 1. (February 2021), 67‒76. https://doi.org/10.1007/s11036-020-01722-0Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Redmon and A. Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767. 1, 1. (April 2018), 1‒6. https://doi.org/10.48550/arXiv.1804.02767Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Glenn, S. Alex, B. Jirka, C. Liu, H. Adam, I. Francisco, P. Jake, F. Jiacong, Y. Lijun, W. Mingyu, G. Naman, A. Osama, and R. Prashant. 2021. ultralytics/YOLOv5: v4.0. PyTorch Hub integration. (July 2021), https://zenodo.org/record/4418161Google ScholarGoogle Scholar
  14. C. Y. Wang, A. Bochkovskiy, and H. Y. Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors), Vancouver, Canada 7464‒7475. http://doi.org/10.1109/CVPR52729.2023.00721Google ScholarGoogle ScholarCross RefCross Ref
  15. G. Jocher, A. Chaurasia, and J. Qiu. 2023. YOLO by Ultralytics (Version 8.0.0). AGPL-3.0. (June 2023), https://github.com/ultralytics/ultralyticsGoogle ScholarGoogle Scholar
  16. Y. Chen, Y. Cao, H. Hu, and L. Wang. 2020. Memory enhanced global-local aggregation for video object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20), Seattle, USA, 10337‒10346. https://doi.org/10.1109/CVPR42600.2020.01035Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Y. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, J. Malik, and C. Feichtenhofer. 2022. Memvit: Memory-augmented multiscale vision Transformer for efficient long-term video recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22), New Orleans, USA, 13587‒13597. https://doi.org/10.1109/CVPR52688.2022.01322Google ScholarGoogle ScholarCross RefCross Ref
  18. D. Zhang, R. Mao, R. Guo, Y. Jiang, and J. Zhu. 2023. YOLO-table: Disclosure document table detection with involution. International Journal on Document Analysis Recognition. 26, 1. (March 2023), 1‒14. https://doi.org/10.1007/s10032-022-00400-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. Zhao, P. Tang, L. Zhao, and Z. Zhang. 2022. Few-Shot object detection of remote sensing images via two-stage fine-tuning. IEEE Geoscience and Remote Sensing Letters. 19, 10. (October 2022), 1‒5. http://doi.org/10.1109/LGRS.2021.3116858Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Tang, Y. Fang, and S. Zhang. 2023. HIC-YOLOv5: Improved YOLOv5 for small object detection. arXiv preprint arXiv:2309.16393. 1, 1. (September 2023), 1‒7. https://doi.org/10.48550/arXiv.2309.16393Google ScholarGoogle ScholarCross RefCross Ref
  21. M. A. Rahman, J. Peethambaran, and N. London. 2023. RBF weighted hyper-involution for RGB-D object detection. arXiv preprint arXiv:2310.00342. 1, 1. (September 2023), 1‒27. https://doi.org/10.48550/arXiv.2310.00342Google ScholarGoogle ScholarCross RefCross Ref
  22. G. R. Wang, S. Y. Chen, G. Hu, D. X. Pang, and Z. M. Wang. 2023. Detection algorithm of abnormal flow state fluid on closed vibrating screen based on improved YOLOv5. Engineering Applications of Artificial Intelligence. 123, 4. (August 2023), 1062‒1072. https://doi.org/10.1016/j.engappai.2023.106272Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Zhu, H. Hu, S. Lin, and J. Dai. 2019. Deformable convnets v2: More deformable, better results. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19), Long Beach, USA, 9300‒9308. https://doi.org/10.1109/CVPR.2019.00953Google ScholarGoogle ScholarCross RefCross Ref
  24. W. Yang, J. Wu, J. Zhang, K. Gao, R. Du, Z. Wu, E. Firkat, and D. Li. 2023. Deformable convolution and coordinate attention for fast cattle detection. Computers Electronics in Agriculture. 211, 1. (August 2023), 108006. https://doi.org/10.1016/j.compag.2023.108006Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Chen, Y. Du, Y. Fu, J. Zhu, and H. Zeng. 2023. DCAM-Net: A rapid detection network for strip steel surface defects based on deformable convolution and attention mechanism. IEEE Transactions on Instrumentation Measurement. 72, 1. (January 2023), 1‒12. https://doi.org/10.1109/TIM.2023.3238698Google ScholarGoogle ScholarCross RefCross Ref
  26. Y. Qi, Y. He, X. Qi, Y. Zhang, and G. Yang. 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'23) 6070‒6079. https://doi.org/10.48550/arXiv.2307.08388Google ScholarGoogle ScholarCross RefCross Ref
  27. C. Yan and N. Razmjooy. 2023. Optimal lung cancer detection based on CNN optimized and improved snake optimization algorithm. Biomedical Signal Processing Control. 86, 1. (September 2023), 105319. https://doi.org/10.1016/j.bspc.2023.105319Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Ahmed, P. Tangri, A. Panda, D. Ramani, and S. Karmakar. 2019. VFNet: A convolutional architecture for accent classification. IEEE India Council International Conference (INDICON'19), Rajkot, India, 1‒4. http://doi.org/10.1109/INDICON47234.2019.9030363Google ScholarGoogle ScholarCross RefCross Ref
  29. W. H. Wang, E. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, and L. Shao. 2021. Pyramid vision Transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'18), Montreal, Canada 568‒578. http://doi.org/10.1109/ICCV48922.2021.00061Google ScholarGoogle ScholarCross RefCross Ref
  30. R. Girshick. 2015. Fast R-CNN. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'15), Santiago, Chile, 1440‒1448. http://doi.org/10.1109/ICCV.2015.169Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z. Cai and N. Vasconcelos. 2019. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 43, 5. (November 2019), 1483‒1498. http://doi.org/10.1109/TPAMI.2019.2956516Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
      January 2024
      480 pages
      ISBN:9798400716720
      DOI:10.1145/3647649

      Copyright © 2024 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)13
      • Downloads (Last 6 weeks)13

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format