Abstract
With the development of artificial intelligence, the application of AI technology in the media industry is in progress. Video auto-editing is one of the directions. In video editing, the shot scale is am important reference for shot arrangement. The existing algorithms tend to classify the shot scale based on CNN, but fail to work well on all kinds of frames with various aspect ratios. One of the focuses in this paper is to explore the relationship between pooling method and location bias in CNN, so that location features and non-location features could be treated reasonably to reach a better classification performance on kinds of frames with various aspect ratios. In a set of interesting experiments, we change the output feature maps of pooling(OFMP) to observe how CNN classify a group of images by location features and non-location features. Then, a vertical and horizontal pooling method(VH-Pooling) is proposed for a robust shot scale classification, which achieves 94.24% accuracy on a multi-aspect-ratio shot scale dataset within a high operation speed. Finally, a practical shot scale classification system is designed with a post-processing module, and successfully applied in a live news AI-editing platform.
Similar content being viewed by others
References
Bak HY et al (2020) Comparative study of movie shot classification based on semantic segmentation. Appl Sci 10(10):3390
Baker BN et al (2018) Deep convolutional networks do not classify based on global object shape. PLoS Comput Biol 12:14
Benini S et al (2010) Estimating cinematographic scene depth in movie shots. In: Proc. IEEE ICME, Singapore, pp 855–860
Benini S et al (2016) Shot scale distribution in art films. Multimed Tools Appl 75(23):16499–16527
Carreira J et al (2017) Quo vadis, action recognition?a new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Cherif I et al (2007) Shot type identification of movie content. In: Proc. 2007 9th ISSPA, Sharjah, United Arab Emirates, pp 1–4
Geirhos GR et al (2019) ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy And Robustness. In: Internation Conference on Learning Representations
Girshick R (2015) Fast R-CNN. Computer Science
Hermann KL et al (2020) The origins and prevalence of texture bias in convolutional neural networks. Adv Neu Infor Proc 33
He K et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K et al (2016) Deep residual learning for image recognition. In: Proc. IEEE CVPR, Las Vegas, NV, USA, pp 770–778
He K et al (2017) Mask r-CNN. In: Proc. IEEE ICCV, Venice, Italy, pp 2980–2988
Howard AG et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Comput Sci
Hui J et al (2011) Tennis video shot classification based on support vector machine. In: Proc. IEEE CSAE, Shanghai, China, pp 751–761
Iandola NF et al (2016) Squeezenet: AlexNet-level accuracy with 50x fewer parameters And! 0.5 MB model size. Comput Sci
Islam MA et al (2020) How much position information do convolutional neural networks encode? ICLR, 2020
Jia D et al (2009) ImageNet: a large-scale hierarchical image database. In: Proc. IEEE CVPR, Miami, FL, USA, pp 248–255
Lin JC et al (Nov. 2018) Coherent Deep-Net fusion to classify shots in concert videos. IEEE Transactions On Multimedia 20(11):3123–3136
Minhas RA et al (2019) Shot classification of field sports videos using AlexNet convolutional neural network. Appl Sci 9(3):483
Rao A et al (2020) A unified framework for shot type classification based on subject centric lens. In: Proc. ECCV, Glasgow, UK, pp 17–34
Savardi M et al (2018) Shot scale analysis in movies by convolutional neural networks. In: Proc. IEEE ICIP, Athens, Greece, pp 2620–2624
Simonyan K et al (2014) Very deep convolutional networks for Large-Scale image recognition. Comput Sci
Vacchetti B et al (2020) Cinematographic shot classification through deep learning. In: Proc. IEEE COMPSAC, Madrid, Spain, pp 345–350
Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer Cham
Yu JQ et al (2009) Scene tune recognition and detection in film videos. J Comput Appl 29(12):3422–3426
Zhou YH et al (2005) Soccer video shot classification method based on color and edge distribution. J Beijing I Technol 25(12):1079–1082
Funding
This work was supported by the Fundamental Research Funds for the Central Universities under Grant CUC210B018 and the National Natural Science Foundation of China under Grant 61901422.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests/Competing interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Zeyu Chen and Yana Zhang contributed equally to this work.
Rights and permissions
About this article
Cite this article
Chen, Z., Zhang, Y., Zhang, S. et al. Study on location bias of CNN for shot scale classification. Multimed Tools Appl 81, 40289–40309 (2022). https://doi.org/10.1007/s11042-022-13111-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13111-8