research-article

Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes

Authors:
Wensheng Li

Faculty of Information Technology, Beijing University of Technology, China

Faculty of Information Technology, Beijing University of Technology, China

0009-0007-6312-4157
View Profile

,
Jing Zhang

Faculty of Information Technology, Beijing University of Technology, China

Faculty of Information Technology, Beijing University of Technology, China

0000-0003-1290-0738
View Profile

,
Chenyu Yuan

Faculty of Information Technology, Beijing University of Technology, China

Faculty of Information Technology, Beijing University of Technology, China

0009-0000-8115-2230
View Profile

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics ProcessingJanuary 2024Pages 402–408https://doi.org/10.1145/3647649.3647712

Published:03 May 2024Publication History

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

Pages 402–408

ABSTRACT

Live video platforms have attracted many active streamers and daily users, and quickly understanding live video streaming scenes is crucial for ensuring the clean and healthy cyberspace. Video logo often appear in live video and can serve as key clues for understanding video streaming scenes. Due to the presence of the jitter scenes, as well as blur and deformed logos in live streaming scenes, we propose a spatial-channel specific snake-yolov8 for video logo detection in live streaming scenes. First, we design an involutional fusion layer to fuse interframe logo features for jitter scenes. Then, the spatial-channel specific involution is proposed to extract spatial and channel context, to adapt to blurring logos during logo detection. Finally, to cope with deformable logos in live steaming scenes, we embed snake deformation convolution into spatial-channel specific involution. Experimental results show that under inference efficiency of 39.8 FPS, the mAP of the proposed method reaches 69.7% on LogoDet-3K, and 53.0% on self-built BJUT-VLD, which has certain effectiveness and superiority for video logo detection in live streaming scenes.

References

F. Utaminingrum, R. P. Prasetya, and R. Rizdania. 2020. Combining multiple feature for robust traffic sign detection. Journal of Image and Graphics. 8, 2. (June 2020), 53–58. https://doi.org/10.18178/joig.8.2.53-58Google ScholarCross Ref
R. Khan, T. F. Raisa, and R. Debnath. 2018. An efficient contour based fine-grained algorithm for multi category object detection. Journal of Image and Graphics. 6, 2. (December 2018), 127–136. https://doi.org/10.18178/joig.6.2.127-136Google ScholarCross Ref
S. C. H. Hoi, X. Wu, H. Liu, Y. Wu, H. Wang, H. Xue, and Q. Wu. 2015. Logo-Net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 46, 5. (November 2015), 2403–2412. https://doi.org/10.1109/TPAMI.2015.24621511Google ScholarCross Ref
J. R. R. Uijlings, K. E. A. v. d. Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision. (April 2013), https://doi.org/10.1007/s11263-013-0620-5Google ScholarDigital Library
F. Leng. 2021. A gradient balancing approach for robust logo detection. Proceedings of the ACM International Conference on Multimedia (MM'21), Virtual, 4765–4769. https://doi.org/10.1145/3474085.3479201Google ScholarDigital Library
G. P. Zhang, D. M. Zhang, J. Zhang, C. N. Wang, L. D. Wang, and X. Q. Zou. 2022. TV logo detection and recognition based on data synthesis and metric learning. Journal of Software. 33, 9. (September 2022), 3180‒3194. http://doi.org/10.13328/j.cnki.jos.006619Google ScholarCross Ref
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'16), Las Vegas, USA, 779‒788. https://doi.org/10.1109/CVPR.2016.91Google ScholarCross Ref
S. Huang, Y. Wang, and P. Su. 2016. A new synthetical method of feature enhancement and detection for SAR image targets. Journal of Image and Graphics. 4, 2. (December 2016), 73–77. https://doi.org/10.18178/joig.4.2.73-77Google ScholarCross Ref
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. European Conference on Computer Vision (ECCV'16), Amsterdam, The Netherlands, 21‒37. https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarCross Ref
O. Orti, R. Tous, M. Gomez, J. Poveda, L. Cruz, and O. Wust. 2019. Real-time logo detection in brand-related social media images. International Work-Conference on Artificial Neural Networks (WANN'19), Gran Canaria, Spain, 125‒136. https://doi.org/10.1007/978-3-030-20518-8_11Google ScholarCross Ref
J. Zhang, L. Chen, C. Bo, and S. Yang. 2021. Multi-scale vehicle logo detector. Mobile Networks Applications. 26, 1. (February 2021), 67‒76. https://doi.org/10.1007/s11036-020-01722-0Google ScholarCross Ref
J. Redmon and A. Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767. 1, 1. (April 2018), 1‒6. https://doi.org/10.48550/arXiv.1804.02767Google ScholarCross Ref
J. Glenn, S. Alex, B. Jirka, C. Liu, H. Adam, I. Francisco, P. Jake, F. Jiacong, Y. Lijun, W. Mingyu, G. Naman, A. Osama, and R. Prashant. 2021. ultralytics/YOLOv5: v4.0. PyTorch Hub integration. (July 2021), https://zenodo.org/record/4418161Google Scholar
C. Y. Wang, A. Bochkovskiy, and H. Y. Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors), Vancouver, Canada 7464‒7475. http://doi.org/10.1109/CVPR52729.2023.00721Google ScholarCross Ref
G. Jocher, A. Chaurasia, and J. Qiu. 2023. YOLO by Ultralytics (Version 8.0.0). AGPL-3.0. (June 2023), https://github.com/ultralytics/ultralyticsGoogle Scholar
Y. Chen, Y. Cao, H. Hu, and L. Wang. 2020. Memory enhanced global-local aggregation for video object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20), Seattle, USA, 10337‒10346. https://doi.org/10.1109/CVPR42600.2020.01035Google ScholarCross Ref
C. Y. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, J. Malik, and C. Feichtenhofer. 2022. Memvit: Memory-augmented multiscale vision Transformer for efficient long-term video recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22), New Orleans, USA, 13587‒13597. https://doi.org/10.1109/CVPR52688.2022.01322Google ScholarCross Ref
D. Zhang, R. Mao, R. Guo, Y. Jiang, and J. Zhu. 2023. YOLO-table: Disclosure document table detection with involution. International Journal on Document Analysis Recognition. 26, 1. (March 2023), 1‒14. https://doi.org/10.1007/s10032-022-00400-zGoogle ScholarDigital Library
Z. Zhao, P. Tang, L. Zhao, and Z. Zhang. 2022. Few-Shot object detection of remote sensing images via two-stage fine-tuning. IEEE Geoscience and Remote Sensing Letters. 19, 10. (October 2022), 1‒5. http://doi.org/10.1109/LGRS.2021.3116858Google ScholarCross Ref
S. Tang, Y. Fang, and S. Zhang. 2023. HIC-YOLOv5: Improved YOLOv5 for small object detection. arXiv preprint arXiv:2309.16393. 1, 1. (September 2023), 1‒7. https://doi.org/10.48550/arXiv.2309.16393Google ScholarCross Ref
M. A. Rahman, J. Peethambaran, and N. London. 2023. RBF weighted hyper-involution for RGB-D object detection. arXiv preprint arXiv:2310.00342. 1, 1. (September 2023), 1‒27. https://doi.org/10.48550/arXiv.2310.00342Google ScholarCross Ref
G. R. Wang, S. Y. Chen, G. Hu, D. X. Pang, and Z. M. Wang. 2023. Detection algorithm of abnormal flow state fluid on closed vibrating screen based on improved YOLOv5. Engineering Applications of Artificial Intelligence. 123, 4. (August 2023), 1062‒1072. https://doi.org/10.1016/j.engappai.2023.106272Google ScholarDigital Library
X. Zhu, H. Hu, S. Lin, and J. Dai. 2019. Deformable convnets v2: More deformable, better results. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19), Long Beach, USA, 9300‒9308. https://doi.org/10.1109/CVPR.2019.00953Google ScholarCross Ref
W. Yang, J. Wu, J. Zhang, K. Gao, R. Du, Z. Wu, E. Firkat, and D. Li. 2023. Deformable convolution and coordinate attention for fast cattle detection. Computers Electronics in Agriculture. 211, 1. (August 2023), 108006. https://doi.org/10.1016/j.compag.2023.108006Google ScholarDigital Library
H. Chen, Y. Du, Y. Fu, J. Zhu, and H. Zeng. 2023. DCAM-Net: A rapid detection network for strip steel surface defects based on deformable convolution and attention mechanism. IEEE Transactions on Instrumentation Measurement. 72, 1. (January 2023), 1‒12. https://doi.org/10.1109/TIM.2023.3238698Google ScholarCross Ref
Y. Qi, Y. He, X. Qi, Y. Zhang, and G. Yang. 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'23) 6070‒6079. https://doi.org/10.48550/arXiv.2307.08388Google ScholarCross Ref
C. Yan and N. Razmjooy. 2023. Optimal lung cancer detection based on CNN optimized and improved snake optimization algorithm. Biomedical Signal Processing Control. 86, 1. (September 2023), 105319. https://doi.org/10.1016/j.bspc.2023.105319Google ScholarCross Ref
A. Ahmed, P. Tangri, A. Panda, D. Ramani, and S. Karmakar. 2019. VFNet: A convolutional architecture for accent classification. IEEE India Council International Conference (INDICON'19), Rajkot, India, 1‒4. http://doi.org/10.1109/INDICON47234.2019.9030363Google ScholarCross Ref
W. H. Wang, E. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, and L. Shao. 2021. Pyramid vision Transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'18), Montreal, Canada 568‒578. http://doi.org/10.1109/ICCV48922.2021.00061Google ScholarCross Ref
R. Girshick. 2015. Fast R-CNN. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'15), Santiago, Chile, 1440‒1448. http://doi.org/10.1109/ICCV.2015.169Google ScholarDigital Library
Z. Cai and N. Vasconcelos. 2019. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 43, 5. (November 2019), 1483‒1498. http://doi.org/10.1109/TPAMI.2019.2956516Google ScholarCross Ref

Index Terms

Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Multi-camera Live Video Streaming over Wireless Network
Advances in Mobile Computing and Multimedia Intelligence
Abstract
Due to the development of wireless communication technology, more and more streamers are using cameras mounted on mobile devices for live streaming in a wireless LAN environment. Conventional live streaming systems, which employ multiple images ...
Read More
A Complete Logo Detection/Recognition System for Document Images
SBES '13: Proceedings of the 2013 27th Brazilian Symposium on Software Engineering

In this paper, a complete logo detection/ recognition system for document images is proposed. In the proposed system, first, a logo detection method is employed to detect a few regions of interest (logo-patches), which likely contain the logo(s), in a ...
Read More
Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach
ICEBI '22: Proceedings of the 2022 6th International Conference on E-Business and Internet

Live streaming e-commerce has exploded recently. While the live streaming traffic is dominated by the top live streamers, merchants and ordinary live streamers attempt to establish self-operating live streaming, but the number of fans and sales ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing
January 2024
480 pages
ISBN:9798400716720
DOI:10.1145/3647649

Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Snake-YOLOv8
live streaming scenes
logo detection
spatial-channel specific
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 13
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-camera Live Video Streaming over Wireless Network

A Complete Logo Detection/Recognition System for Document Images

Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-camera Live Video Streaming over Wireless Network

A Complete Logo Detection/Recognition System for Document Images

Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media