Abstract
Scene segmentation is the task of segmenting the video in groups of frames with a high degree of semantic similarity. In this paper, we contribute to the task of video scene segmentation with the creation of a novel dataset for temporal scene segmentation. In addition, we propose the combination of two deep models to classify whether two video frames belong to the same or a different scene. The first model consists of a triplet network that is composed of 3 instances of the same 2D convolutional network. These instances correspond to a multi-scale net that performs frame embedding efficiently based on their similarity. We feed this network with an efficient triplet sampling algorithm. The second model is responsible for classifying whether these embeddings correspond to frames from different scenes by fine-tuning a siamese network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aytar, Y., Vondrick, C., Torralba, A.: Learning sound representations from unlabeled video, Soundnet (2016)
Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, pp. 1199–1202, New York, NY, USA (2015). Association for Computing Machinery
Baraldi, L., Grana, C., Cucchiara, R.: A deep Siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd ACM International Conference on Multimedia, ACM (2015)
Baraldi, L., Grana, C., Cucchiara, R.: Shot and scene detection via hierarchical clustering for re-using broadcast video. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9256, pp. 801–811. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23192-1_67
Berhe, A., Guinaudeau, C., Barras, C.: Video scene segmentation of tv series using multi-modal neural features (2019)
Bouyahi, M., Benayed, Y.: Video scenes segmentation based on multimodal genre prediction. Procedia Comput. Sci. 176,10–21 (2020)
Castellano, B.: PySceneDetect 2014–2022. https://github.com/Breakthrough/PySceneDetect
Chen, S., Nie, X., Fan, D., Zhang, D., Bhat, V., Hamid, R.: Shot contrastive self-supervised learning for scene boundary detection (2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Shun ichi Amari: Backpropagation and stochastic gradient descent method. Neurocomputing 5(4), 185–196 (1993)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., editors, Advances in Neural Information Processing Systems, vol. 25. Curran Associates Inc (2012)
Maćkiewicz, A., Ratajczak, W.: Principal components analysis (PCA). Comput. Geosci. 19(3), 303–342 (1993)
Mun, J., et al.: Boundary-aware self-supervised learning for video scene segmentation (2022)
OpenCV. Image Thresholding (2023). https://docs.opencv.org/4.x/d7/d4d/tutorial_py_thresholding.html
OpenCV. Sobel Derivatives (2023). https://docs.opencv.org/3.4/d2/d2c/tutorial_sobel_derivatives.html
Rao, A., et al.: A local-to-global approach to multi-modal movie scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10146–10155 (2020)
Ruby, U., Yendapalli, V.: Binary cross entropy with deep learning technique for image classification. Int. J. Adv. Trends Comput. Sci. Eng. 9, 10 (2020)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Sivaraman, K., Somappa, G.: MovieScope: movie trailer classification using deep neural networks (2017)
Statista. Digital media - video on demand worldwide (2023). https://www.statista.com/outlook/dmo/digital-media/video-on-demand/worldwide#revenue
Tapu, R., Mocanu, B., Zaharia, T.: DEEP-AD: a multimodal temporal video segmentation framework for online video advertising. IEEE Access 8, 99582–99597 (2020)
Rotten Tomatoes. @movieclips, 2006–2023. https://www.youtube.com/@MOVIECLIPS
Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. Multimedia, IEEE Trans. 4, 492–499 (2003)
Wang, J., et al.: Learning fine-grained image similarity with deep ranking. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, vol. 2, pp. 1398–1402 (2003)
Wu, H., et al.: Scene consistency representation learning for video scene segmentation (2022)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition (2018)
Acknowledgment
We would first like to thanks Telefonica I+D for supporting the Industrial Phd of Miguel Esteve Brotons. We would like to thank ‘A way of making Europe” European Regional Development Fund (ERDF) and MCIN/AEI/10.13039/501100011033 for supporting this work under the TED2021-130890B (CHAN-TWIN) research Project funded by MCIN/AEI /10.13039/501100011033 and European Union NextGenerationEU/ PRTR, and AICARE project (grant SPID202200X139779IV0). Also the HORIZON-MSCA-2021-SE-0 action number: 101086387, REMARKABLE, Rural Environmental Monitoring via ultra wide-ARea networKs And distriButed federated Learning. Finally, we also would like to thank Nvidia for their generous hardware donations that made these experiments possible.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Esteve Brotons, M.J., Carmona Blanco, J., Lucendo, F.J., García-Rodríguez, J. (2023). Video Scene Segmentation Based on Triplet Loss Ranking. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2023. Lecture Notes in Computer Science, vol 14134. Springer, Cham. https://doi.org/10.1007/978-3-031-43085-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-43085-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43084-8
Online ISBN: 978-3-031-43085-5
eBook Packages: Computer ScienceComputer Science (R0)