Skip to main content
Log in

Study on location bias of CNN for shot scale classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With the development of artificial intelligence, the application of AI technology in the media industry is in progress. Video auto-editing is one of the directions. In video editing, the shot scale is am important reference for shot arrangement. The existing algorithms tend to classify the shot scale based on CNN, but fail to work well on all kinds of frames with various aspect ratios. One of the focuses in this paper is to explore the relationship between pooling method and location bias in CNN, so that location features and non-location features could be treated reasonably to reach a better classification performance on kinds of frames with various aspect ratios. In a set of interesting experiments, we change the output feature maps of pooling(OFMP) to observe how CNN classify a group of images by location features and non-location features. Then, a vertical and horizontal pooling method(VH-Pooling) is proposed for a robust shot scale classification, which achieves 94.24% accuracy on a multi-aspect-ratio shot scale dataset within a high operation speed. Finally, a practical shot scale classification system is designed with a post-processing module, and successfully applied in a live news AI-editing platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bak HY et al (2020) Comparative study of movie shot classification based on semantic segmentation. Appl Sci 10(10):3390

    Article  Google Scholar 

  2. Baker BN et al (2018) Deep convolutional networks do not classify based on global object shape. PLoS Comput Biol 12:14

    Google Scholar 

  3. Benini S et al (2010) Estimating cinematographic scene depth in movie shots. In: Proc. IEEE ICME, Singapore, pp 855–860

  4. Benini S et al (2016) Shot scale distribution in art films. Multimed Tools Appl 75(23):16499–16527

    Article  Google Scholar 

  5. Carreira J et al (2017) Quo vadis, action recognition?a new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  6. Cherif I et al (2007) Shot type identification of movie content. In: Proc. 2007 9th ISSPA, Sharjah, United Arab Emirates, pp 1–4

  7. Geirhos GR et al (2019) ImageNet-Trained CNNs Are Biased Towards Texture; Increasing Shape Bias Improves Accuracy And Robustness. In: Internation Conference on Learning Representations

  8. Girshick R (2015) Fast R-CNN. Computer Science

  9. Hermann KL et al (2020) The origins and prevalence of texture bias in convolutional neural networks. Adv Neu Infor Proc 33

  10. He K et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  11. He K et al (2016) Deep residual learning for image recognition. In: Proc. IEEE CVPR, Las Vegas, NV, USA, pp 770–778

  12. He K et al (2017) Mask r-CNN. In: Proc. IEEE ICCV, Venice, Italy, pp 2980–2988

  13. Howard AG et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. Comput Sci

  14. Hui J et al (2011) Tennis video shot classification based on support vector machine. In: Proc. IEEE CSAE, Shanghai, China, pp 751–761

  15. Iandola NF et al (2016) Squeezenet: AlexNet-level accuracy with 50x fewer parameters And! 0.5 MB model size. Comput Sci

  16. Islam MA et al (2020) How much position information do convolutional neural networks encode? ICLR, 2020

  17. Jia D et al (2009) ImageNet: a large-scale hierarchical image database. In: Proc. IEEE CVPR, Miami, FL, USA, pp 248–255

  18. Lin JC et al (Nov. 2018) Coherent Deep-Net fusion to classify shots in concert videos. IEEE Transactions On Multimedia 20(11):3123–3136

  19. Minhas RA et al (2019) Shot classification of field sports videos using AlexNet convolutional neural network. Appl Sci 9(3):483

    Article  Google Scholar 

  20. Rao A et al (2020) A unified framework for shot type classification based on subject centric lens. In: Proc. ECCV, Glasgow, UK, pp 17–34

  21. Savardi M et al (2018) Shot scale analysis in movies by convolutional neural networks. In: Proc. IEEE ICIP, Athens, Greece, pp 2620–2624

  22. Simonyan K et al (2014) Very deep convolutional networks for Large-Scale image recognition. Comput Sci

  23. Vacchetti B et al (2020) Cinematographic shot classification through deep learning. In: Proc. IEEE COMPSAC, Madrid, Spain, pp 345–350

  24. Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. European conference on computer vision. Springer Cham

  25. Yu JQ et al (2009) Scene tune recognition and detection in film videos. J Comput Appl 29(12):3422–3426

    Google Scholar 

  26. Zhou YH et al (2005) Soccer video shot classification method based on color and edge distribution. J Beijing I Technol 25(12):1079–1082

    Google Scholar 

Download references

Funding

This work was supported by the Fundamental Research Funds for the Central Universities under Grant CUC210B018 and the National Natural Science Foundation of China under Grant 61901422.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zeyu Chen.

Ethics declarations

Conflict of Interests/Competing interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zeyu Chen and Yana Zhang contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Zhang, Y., Zhang, S. et al. Study on location bias of CNN for shot scale classification. Multimed Tools Appl 81, 40289–40309 (2022). https://doi.org/10.1007/s11042-022-13111-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13111-8

Keywords

Navigation