Abstract
Video is the most widely used media format. Automating the editing process would impact many areas, from the film industry to social media content. The editing process defines the structure of a video. In this paper, we present a new method to analyze and characterize the structure of 30-second videos. Specifically, we study the video structure in terms of sequences of shots. We investigate what type of relation there is between what is shown in the video and the sequence of shots used to represent it and if it is possible to define editing classes. To this aim, labeled data are needed, but unfortunately they are not available. Hence, it is necessary to develop new data-driven methodologies to address this issue. In this paper we present Movie Lens, a data driven approach to discover and characterize editing patterns in the analysis of short movie sequences. Its approach relies on the exploitation of the Levenshtein distance, the K-Means algorithm, and a Multilayer Perceptron (MLP). Through the Levenshtein distance and the K-Means algorithm we indirectly label 30 s long movie shot sequences. Then, we train a Multilayer Perceptron to assess the validity of our approach. Additionally the MLP helps domain experts to assess the semantic concepts encapsulated by the identified clusters. We have taken out data from the Cinescale dataset. We have gathered 23 887 shot sequences from 120 different movies. Each sequence is 30 s long. The performance of Movie Lens in terms of accuracy varies (93% - 77%) in relation to the number of classes considered (4-32). We also present a preliminary characterization concerning the identified classes and their relative editing patterns in 16 classes scenario, reaching an overall accuracy of 81%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
the Multilayer Perceptron for sentence classification can be retrieved from a GitHub repository [32].
References
Argaw, D.M., Heilbron, F.C., Lee, J.Y., Woodson, M., Kweon, I.: The anatomy of video editing: a dataset and benchmark suite for AI-assisted video editing. ArXiv abs/2207.09812 (2022)
Bain, M., Nagrani, A., Brown, A., Zisserman, A.: Condensed movies: story based retrieval with contextual embeddings. CoRR abs/2005.04208 (2020). https://arxiv.org/abs/2005.04208
Bak, H.Y., Park, S.B.: Comparative study of movie shot classification based on semantic segmentation. Applied Sci. 10, 3390 (2020). https://doi.org/10.3390/app10103390
Benini, S., Savardi, M., Balint, K., Kovacs, A., Signoroni, A.: On the influence of shot scale on film mood and narrative engagement in film viewers. IEEE Trans. Affect. Comput. 13(2), 592–603 (2022). https://doi.org/10.1109/taffc.2019.2939251
Berthouzoz, F., Li, W., Agrawala, M.: Tools for placing cuts and transitions in interview video. ACM Trans. Graph. 31, 1–8 (2012). https://doi.org/10.1145/2185520.2335418
Bloemheuvel, S., van den Hoogen, J., Jozinovic, D., Michelini, A., Atzmueller, M.: Multivariate time series regression with graph neural networks. CoRR abs/2201.00818 (2022). https://arxiv.org/abs/2201.00818
Chakraborty, S., Nagwani, N., Dey, L.: Performance comparison of incremental k-means and incremental dbscan algorithms. Int. J. Comput. Appl. 27, 975–8887 (2011)
Haldar, R., Mukhopadhyay, D.: Levenshtein distance technique in dictionary lookup methods: an improved approach. Computing Research Repository - CORR (2011)
Hasan, M.A., Xu, M., He, X., Xu, C.: CAMHID: camera motion histogram descriptor and its application to cinematographic shot classification. IEEE Trans. Circuits Syst. Video Technol. 24(10), 1682–1695 (2014). https://doi.org/10.1109/TCSVT.2014.2345933
He, Z., Gao, S., Xiao, L., Liu, D., He, H., Barber, D.: Wider and deeper, cheaper and faster: tensorized LSTMS for sequence learning (2017)
Jani, K., Chaudhuri, M., Patel, H., Shah, M.: Machine learning in films: an approach towards automation in film censoring. J. Data Inf. Manage. 2(1), 55–64 (2019). https://doi.org/10.1007/s42488-019-00016-9
Juang, B.H., Rabiner, L.: The segmental k-means algorithm for estimating parameters of hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 38(9), 1639–1641 (1990)
Liberti, L., Lavor, C., Maculan, N., Mucherino, A.: Euclidean distance geometry and applications. SIAM Rev. 56, 120875909 (2012). https://doi.org/10.1137/120875909
Matsuo, Y., Amano, M., Uehara, K.: Mining video editing rules in video streams, pp. 255–258 (2002). https://doi.org/10.1145/641007.641058
Mogadala, A., Kalimuthu, M., Klakow, D.: Trends in integration of vision and language research: a survey of tasks, datasets, and methods. J. Artif. Int. Res. 71, 1183–1317 (2021). https://doi.org/10.1613/jair.1.11688
Murch, W.: In the Blink of an Eye. Silman-James Press (2001)
Nothelfer, C., DeLong, J., Cutting, J.E.: Shot structure in Hollywood film (2009)
Pardo, A., Heilbron, F.C., Alcázar, J.L., Thabet, A.K., Ghanem, B.: Learning to cut by watching movies. CoRR abs/2108.04294 (2021). https://arxiv.org/abs/2108.04294
Podlesnyy, S.: Towards data-driven automatic video editing (2019)
Qaisar, S.: Sentiment analysis of IMDB movie reviews using long short-term memory (2020). https://doi.org/10.1109/ICCIS49240.2020.9257657
Ramesh, A., et al.: Zero-shot text-to-image generation (2021). https://doi.org/10.48550/ARXIV.2102.12092. https://arxiv.org/abs/2102.12092
Rao, A., Wang, J., Xu, L., Jiang, X., Huang, Q., Zhou, B., Lin, D.: A unified framework for shot type classification based on subject centric lens. CoRR abs/2008.03548 (2020). https://arxiv.org/abs/2008.03548
Ren, J., Shen, X., Lin, Z., Měch, R.: Best frame selection in a short video. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3201–3210 (2020). https://doi.org/10.1109/WACV45572.2020.9093615
Savardi, M., Kovács, A.B., Signoroni, A., Benini, S.: Cinescale: A dataset of cinematic shot scale in movies. Data Brief 36, 107002 (2021)
Savardi, M., Signoroni, A., Migliorati, P., Benini, S.: Shot scale analysis in movies by convolutional neural networks, pp. 2620–2624 (2018). https://doi.org/10.1109/ICIP.2018.8451474
Simões, G., Wehrmann, J., Barros, R., Ruiz, D.: Movie genre classification with convolutional neural networks, pp. 259–266 (2016). https://doi.org/10.1109/IJCNN.2016.7727207
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
Soe, T.H.: Automation in video editing: assisted workflows in video editing. In: AutomationXP@CHI (2021)
Svanera, M., Savardi, M., Signoroni, A., Kovács, A.B., Benini, S.: Who is the film’s director? authorship recognition based on shot features. IEEE Multimedia 26(4), 43–54 (2019). https://doi.org/10.1109/MMUL.2019.2940004
Vacchetti, B., Cerquitelli, T.: Cinematographic shot classification with deep ensemble learning. Electronics 11(10), 1570 (2022)
Vacchetti, B., Cerquitelli, T., Antonino, R.: Cinematographic shot classification through deep learning. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp. 345–350 (2020). https://doi.org/10.1109/COMPSAC48688.2020.0-222
Walters, A.: Sentence classification. https://github.com/lettergram/sentence-classification
Wang, M., Yang, G.W., Hu, S.M., Yau, S.T., Shamir, A.: Write-a-video: computational video montage from themed text. ACM Trans. Graph. 38(6) 1–13 (2019). https://doi.org/10.1145/3355089.3356520
Wu, H.Y., Santarra, T., Leece, M., Vargas, R., Jhala, A.: Joint attention for automated video editing. In: ACM International Conference on Interactive Media Experiences, pp. 55–64. IMX 2020, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3391614.3393656
Zhang, X., Li, Y., Han, Y., Wen, J.: AI video editing: a survey (2021). https://doi.org/10.20944/preprints202201.0016.v1
Zhou, H., Hermans, T., Karandikar, A., Rehg, J.: Movie genre classification via scene categorization, pp. 747–750 (2010). https://doi.org/10.1145/1873951.1874068
Zhou, J., Zhang, X.P.: Automatic identification of digital video based on shot-level sequence matching. In: Proceedings of the 13th Annual ACM International Conference on Multimedia, pp. 515–518. MULTIMEDIA 2005, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1101149.1101265
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vacchetti, B., Cerquitelli, T. (2023). Movie Lens: Discovering and Characterizing Editing Patterns in the Analysis of Short Movie Sequences. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13804. Springer, Cham. https://doi.org/10.1007/978-3-031-25069-9_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-25069-9_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25068-2
Online ISBN: 978-3-031-25069-9
eBook Packages: Computer ScienceComputer Science (R0)