Study on location bias of CNN for shot scale classification

Chen, Zeyu; Zhang, Yana; Zhang, Suya; Yang, Cheng

doi:10.1007/s11042-022-13111-8

Study on location bias of CNN for shot scale classification

Published: 07 May 2022

Volume 81, pages 40289–40309, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zeyu Chen ORCID: orcid.org/0000-0003-2766-2031¹,
Yana Zhang¹,
Suya Zhang^1,2 &
…
Cheng Yang¹

249 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

With the development of artificial intelligence, the application of AI technology in the media industry is in progress. Video auto-editing is one of the directions. In video editing, the shot scale is am important reference for shot arrangement. The existing algorithms tend to classify the shot scale based on CNN, but fail to work well on all kinds of frames with various aspect ratios. One of the focuses in this paper is to explore the relationship between pooling method and location bias in CNN, so that location features and non-location features could be treated reasonably to reach a better classification performance on kinds of frames with various aspect ratios. In a set of interesting experiments, we change the output feature maps of pooling(OFMP) to observe how CNN classify a group of images by location features and non-location features. Then, a vertical and horizontal pooling method(VH-Pooling) is proposed for a robust shot scale classification, which achieves 94.24% accuracy on a multi-aspect-ratio shot scale dataset within a high operation speed. Finally, a practical shot scale classification system is designed with a post-processing module, and successfully applied in a live news AI-editing platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shot Boundary Detection with Spatial-Temporal Convolutional Neural Networks

A lightweight weak semantic framework for cinematographic shot classification

Article Open access 26 September 2023

Video shot boundary detection based on multi-level features collaboration

Article 17 October 2020

References

Bak HY et al (2020) Comparative study of movie shot classification based on semantic segmentation. Appl Sci 10(10):3390
Article Google Scholar
Baker BN et al (2018) Deep convolutional networks do not classify based on global object shape. PLoS Comput Biol 12:14
Google Scholar
Benini S et al (2010) Estimating cinematographic scene depth in movie shots. In: Proc. IEEE ICME, Singapore, pp 855–860
Benini S et al (2016) Shot scale distribution in art films. Multimed Tools Appl 75(23):16499–16527
Article Google Scholar
Carreira J et al (2017) Quo vadis, action recognition?a new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cherif I et al (2007) Shot type identification of movie content. In: Proc. 2007 9th ISSPA, Sharjah, United Arab Emirates, pp 1–4
Geirhos GR et al (2019) ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy And Robustness. In: Internation Conference on Learning Representations
Girshick R (2015) Fast R-CNN. Computer Science
Hermann KL et al (2020) The origins and prevalence of texture bias in convolutional neural networks. Adv Neu Infor Proc 33
He K et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
He K et al (2016) Deep residual learning for image recognition. In: Proc. IEEE CVPR, Las Vegas, NV, USA, pp 770–778
He K et al (2017) Mask r-CNN. In: Proc. IEEE ICCV, Venice, Italy, pp 2980–2988
Howard AG et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Comput Sci
Hui J et al (2011) Tennis video shot classification based on support vector machine. In: Proc. IEEE CSAE, Shanghai, China, pp 751–761
Iandola NF et al (2016) Squeezenet: AlexNet-level accuracy with 50x fewer parameters And! 0.5 MB model size. Comput Sci
Islam MA et al (2020) How much position information do convolutional neural networks encode? ICLR, 2020
Jia D et al (2009) ImageNet: a large-scale hierarchical image database. In: Proc. IEEE CVPR, Miami, FL, USA, pp 248–255
Lin JC et al (Nov. 2018) Coherent Deep-Net fusion to classify shots in concert videos. IEEE Transactions On Multimedia 20(11):3123–3136
Minhas RA et al (2019) Shot classification of field sports videos using AlexNet convolutional neural network. Appl Sci 9(3):483
Article Google Scholar
Rao A et al (2020) A unified framework for shot type classification based on subject centric lens. In: Proc. ECCV, Glasgow, UK, pp 17–34
Savardi M et al (2018) Shot scale analysis in movies by convolutional neural networks. In: Proc. IEEE ICIP, Athens, Greece, pp 2620–2624
Simonyan K et al (2014) Very deep convolutional networks for Large-Scale image recognition. Comput Sci
Vacchetti B et al (2020) Cinematographic shot classification through deep learning. In: Proc. IEEE COMPSAC, Madrid, Spain, pp 345–350
Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer Cham
Yu JQ et al (2009) Scene tune recognition and detection in film videos. J Comput Appl 29(12):3422–3426
Google Scholar
Zhou YH et al (2005) Soccer video shot classification method based on color and edge distribution. J Beijing I Technol 25(12):1079–1082
Google Scholar

Download references

Funding

This work was supported by the Fundamental Research Funds for the Central Universities under Grant CUC210B018 and the National Natural Science Foundation of China under Grant 61901422.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Communication University of China, Beijing, 100024, China
Zeyu Chen, Yana Zhang, Suya Zhang & Cheng Yang
XinHuaZhiYunInc., No.28 Xuanwumenwai Street, Xicheng District, Beijing, 100075, China
Suya Zhang

Authors

Zeyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yana Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Suya Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zeyu Chen.

Ethics declarations

Conflict of Interests/Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zeyu Chen and Yana Zhang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Zhang, Y., Zhang, S. et al. Study on location bias of CNN for shot scale classification. Multimed Tools Appl 81, 40289–40309 (2022). https://doi.org/10.1007/s11042-022-13111-8

Download citation

Received: 31 August 2021
Revised: 02 February 2022
Accepted: 10 April 2022
Published: 07 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-13111-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study on location bias of CNN for shot scale classification

Abstract

Access this article

Similar content being viewed by others

Shot Boundary Detection with Spatial-Temporal Convolutional Neural Networks

A lightweight weak semantic framework for cinematographic shot classification

Video shot boundary detection based on multi-level features collaboration

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Study on location bias of CNN for shot scale classification

Abstract

Access this article

Similar content being viewed by others

Shot Boundary Detection with Spatial-Temporal Convolutional Neural Networks

A lightweight weak semantic framework for cinematographic shot classification

Video shot boundary detection based on multi-level features collaboration

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation