ABSTRACT
Live video platforms have attracted many active streamers and daily users, and quickly understanding live video streaming scenes is crucial for ensuring the clean and healthy cyberspace. Video logo often appear in live video and can serve as key clues for understanding video streaming scenes. Due to the presence of the jitter scenes, as well as blur and deformed logos in live streaming scenes, we propose a spatial-channel specific snake-yolov8 for video logo detection in live streaming scenes. First, we design an involutional fusion layer to fuse interframe logo features for jitter scenes. Then, the spatial-channel specific involution is proposed to extract spatial and channel context, to adapt to blurring logos during logo detection. Finally, to cope with deformable logos in live steaming scenes, we embed snake deformation convolution into spatial-channel specific involution. Experimental results show that under inference efficiency of 39.8 FPS, the mAP of the proposed method reaches 69.7% on LogoDet-3K, and 53.0% on self-built BJUT-VLD, which has certain effectiveness and superiority for video logo detection in live streaming scenes.
- F. Utaminingrum, R. P. Prasetya, and R. Rizdania. 2020. Combining multiple feature for robust traffic sign detection. Journal of Image and Graphics. 8, 2. (June 2020), 53–58. https://doi.org/10.18178/joig.8.2.53-58Google ScholarCross Ref
- R. Khan, T. F. Raisa, and R. Debnath. 2018. An efficient contour based fine-grained algorithm for multi category object detection. Journal of Image and Graphics. 6, 2. (December 2018), 127–136. https://doi.org/10.18178/joig.6.2.127-136Google ScholarCross Ref
- S. C. H. Hoi, X. Wu, H. Liu, Y. Wu, H. Wang, H. Xue, and Q. Wu. 2015. Logo-Net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 46, 5. (November 2015), 2403–2412. https://doi.org/10.1109/TPAMI.2015.24621511Google ScholarCross Ref
- J. R. R. Uijlings, K. E. A. v. d. Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision. (April 2013), https://doi.org/10.1007/s11263-013-0620-5Google ScholarDigital Library
- F. Leng. 2021. A gradient balancing approach for robust logo detection. Proceedings of the ACM International Conference on Multimedia (MM'21), Virtual, 4765–4769. https://doi.org/10.1145/3474085.3479201Google ScholarDigital Library
- G. P. Zhang, D. M. Zhang, J. Zhang, C. N. Wang, L. D. Wang, and X. Q. Zou. 2022. TV logo detection and recognition based on data synthesis and metric learning. Journal of Software. 33, 9. (September 2022), 3180‒3194. http://doi.org/10.13328/j.cnki.jos.006619Google ScholarCross Ref
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. 2016. You only look once: Unified, real-time object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'16), Las Vegas, USA, 779‒788. https://doi.org/10.1109/CVPR.2016.91Google ScholarCross Ref
- S. Huang, Y. Wang, and P. Su. 2016. A new synthetical method of feature enhancement and detection for SAR image targets. Journal of Image and Graphics. 4, 2. (December 2016), 73–77. https://doi.org/10.18178/joig.4.2.73-77Google ScholarCross Ref
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. European Conference on Computer Vision (ECCV'16), Amsterdam, The Netherlands, 21‒37. https://doi.org/10.1007/978-3-319-46448-0_2Google ScholarCross Ref
- O. Orti, R. Tous, M. Gomez, J. Poveda, L. Cruz, and O. Wust. 2019. Real-time logo detection in brand-related social media images. International Work-Conference on Artificial Neural Networks (WANN'19), Gran Canaria, Spain, 125‒136. https://doi.org/10.1007/978-3-030-20518-8_11Google ScholarCross Ref
- J. Zhang, L. Chen, C. Bo, and S. Yang. 2021. Multi-scale vehicle logo detector. Mobile Networks Applications. 26, 1. (February 2021), 67‒76. https://doi.org/10.1007/s11036-020-01722-0Google ScholarCross Ref
- J. Redmon and A. Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767. 1, 1. (April 2018), 1‒6. https://doi.org/10.48550/arXiv.1804.02767Google ScholarCross Ref
- J. Glenn, S. Alex, B. Jirka, C. Liu, H. Adam, I. Francisco, P. Jake, F. Jiacong, Y. Lijun, W. Mingyu, G. Naman, A. Osama, and R. Prashant. 2021. ultralytics/YOLOv5: v4.0. PyTorch Hub integration. (July 2021), https://zenodo.org/record/4418161Google Scholar
- C. Y. Wang, A. Bochkovskiy, and H. Y. Liao. 2023. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors), Vancouver, Canada 7464‒7475. http://doi.org/10.1109/CVPR52729.2023.00721Google ScholarCross Ref
- G. Jocher, A. Chaurasia, and J. Qiu. 2023. YOLO by Ultralytics (Version 8.0.0). AGPL-3.0. (June 2023), https://github.com/ultralytics/ultralyticsGoogle Scholar
- Y. Chen, Y. Cao, H. Hu, and L. Wang. 2020. Memory enhanced global-local aggregation for video object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20), Seattle, USA, 10337‒10346. https://doi.org/10.1109/CVPR42600.2020.01035Google ScholarCross Ref
- C. Y. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, J. Malik, and C. Feichtenhofer. 2022. Memvit: Memory-augmented multiscale vision Transformer for efficient long-term video recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22), New Orleans, USA, 13587‒13597. https://doi.org/10.1109/CVPR52688.2022.01322Google ScholarCross Ref
- D. Zhang, R. Mao, R. Guo, Y. Jiang, and J. Zhu. 2023. YOLO-table: Disclosure document table detection with involution. International Journal on Document Analysis Recognition. 26, 1. (March 2023), 1‒14. https://doi.org/10.1007/s10032-022-00400-zGoogle ScholarDigital Library
- Z. Zhao, P. Tang, L. Zhao, and Z. Zhang. 2022. Few-Shot object detection of remote sensing images via two-stage fine-tuning. IEEE Geoscience and Remote Sensing Letters. 19, 10. (October 2022), 1‒5. http://doi.org/10.1109/LGRS.2021.3116858Google ScholarCross Ref
- S. Tang, Y. Fang, and S. Zhang. 2023. HIC-YOLOv5: Improved YOLOv5 for small object detection. arXiv preprint arXiv:2309.16393. 1, 1. (September 2023), 1‒7. https://doi.org/10.48550/arXiv.2309.16393Google ScholarCross Ref
- M. A. Rahman, J. Peethambaran, and N. London. 2023. RBF weighted hyper-involution for RGB-D object detection. arXiv preprint arXiv:2310.00342. 1, 1. (September 2023), 1‒27. https://doi.org/10.48550/arXiv.2310.00342Google ScholarCross Ref
- G. R. Wang, S. Y. Chen, G. Hu, D. X. Pang, and Z. M. Wang. 2023. Detection algorithm of abnormal flow state fluid on closed vibrating screen based on improved YOLOv5. Engineering Applications of Artificial Intelligence. 123, 4. (August 2023), 1062‒1072. https://doi.org/10.1016/j.engappai.2023.106272Google ScholarDigital Library
- X. Zhu, H. Hu, S. Lin, and J. Dai. 2019. Deformable convnets v2: More deformable, better results. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19), Long Beach, USA, 9300‒9308. https://doi.org/10.1109/CVPR.2019.00953Google ScholarCross Ref
- W. Yang, J. Wu, J. Zhang, K. Gao, R. Du, Z. Wu, E. Firkat, and D. Li. 2023. Deformable convolution and coordinate attention for fast cattle detection. Computers Electronics in Agriculture. 211, 1. (August 2023), 108006. https://doi.org/10.1016/j.compag.2023.108006Google ScholarDigital Library
- H. Chen, Y. Du, Y. Fu, J. Zhu, and H. Zeng. 2023. DCAM-Net: A rapid detection network for strip steel surface defects based on deformable convolution and attention mechanism. IEEE Transactions on Instrumentation Measurement. 72, 1. (January 2023), 1‒12. https://doi.org/10.1109/TIM.2023.3238698Google ScholarCross Ref
- Y. Qi, Y. He, X. Qi, Y. Zhang, and G. Yang. 2023. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'23) 6070‒6079. https://doi.org/10.48550/arXiv.2307.08388Google ScholarCross Ref
- C. Yan and N. Razmjooy. 2023. Optimal lung cancer detection based on CNN optimized and improved snake optimization algorithm. Biomedical Signal Processing Control. 86, 1. (September 2023), 105319. https://doi.org/10.1016/j.bspc.2023.105319Google ScholarCross Ref
- A. Ahmed, P. Tangri, A. Panda, D. Ramani, and S. Karmakar. 2019. VFNet: A convolutional architecture for accent classification. IEEE India Council International Conference (INDICON'19), Rajkot, India, 1‒4. http://doi.org/10.1109/INDICON47234.2019.9030363Google ScholarCross Ref
- W. H. Wang, E. Xie, X. Li, D. P. Fan, K. T. Song, D. Liang, T. Lu, P. Luo, and L. Shao. 2021. Pyramid vision Transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'18), Montreal, Canada 568‒578. http://doi.org/10.1109/ICCV48922.2021.00061Google ScholarCross Ref
- R. Girshick. 2015. Fast R-CNN. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'15), Santiago, Chile, 1440‒1448. http://doi.org/10.1109/ICCV.2015.169Google ScholarDigital Library
- Z. Cai and N. Vasconcelos. 2019. Cascade R-CNN: High quality object detection and instance segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 43, 5. (November 2019), 1483‒1498. http://doi.org/10.1109/TPAMI.2019.2956516Google ScholarCross Ref
Index Terms
- Spatial-Channel Specific Snake-YOLOv8 for Video Logo Detection in Live Streaming Scenes
Recommendations
Multi-camera Live Video Streaming over Wireless Network
Advances in Mobile Computing and Multimedia IntelligenceAbstractDue to the development of wireless communication technology, more and more streamers are using cameras mounted on mobile devices for live streaming in a wireless LAN environment. Conventional live streaming systems, which employ multiple images ...
A Complete Logo Detection/Recognition System for Document Images
SBES '13: Proceedings of the 2013 27th Brazilian Symposium on Software EngineeringIn this paper, a complete logo detection/ recognition system for document images is proposed. In the proposed system, first, a logo detection method is employed to detect a few regions of interest (logo-patches), which likely contain the logo(s), in a ...
Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach
ICEBI '22: Proceedings of the 2022 6th International Conference on E-Business and InternetLive streaming e-commerce has exploded recently. While the live streaming traffic is dominated by the top live streamers, merchants and ordinary live streamers attempt to establish self-operating live streaming, but the number of fans and sales ...
Comments