Abstract
We present a novel video browsing and retrieval system for edited videos, based on scene detection and automatic tagging. In the proposed system, database videos are automatically decomposed into meaningful and storytelling parts (i.e. scenes) and tagged in an automatic way by leveraging their transcript. We rely on computer vision and machine learning techniques to learn the optimal scene boundaries: a Triplet Deep Neural Network is trained to distinguish video sequences belonging to the same scene and sequences from different scenes, by exploiting multimodal features from images, audio and captions. The system also features a user interface build as a set of extensions to the eXo Platform Enterprise Content Management System (ECMS) (https://www.exoplatform.com/). This set of extensions enable the interactive visualization of a video, its automatic and semi-automatic annotation, as well as a keyword-based search inside the video collection. The platform also allows a natural integration with third-party add-ons, so that automatic annotations can be exploited outside the proposed platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Apostolidis, E., Mezaris, V.: Fast shot segmentation combining global and local visual descriptors. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6583–6587 (2014)
Balducci, F., Grana, C., Cucchiara, R.: Affective level design for a role-playing videogame evaluated by a brain-computer interface and machine learning methods. Vis. Comput. 33(4), 413–427 (2017)
Ballan, L., Bertini, M., Serra, G., Del Bimbo, A.: A data-driven approach for tag refinement and localization in web videos. Comput. Vis. Image Underst. 140, 58–67 (2015)
Baraldi, L., Grana, C., Cucchiara, R.: A deep siamese network for scene detection in broadcast videos. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference (MM 2015), pp. 1199–1202 (2015). http://doi.acm.org/10.1145/2733373.2806316
Baraldi, L., Grana, C., Cucchiara, R.: Scene-driven retrieval in edited videos using aesthetic and semantic deep features. In: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (ICMR 2016), pp. 23–29 (2016). http://doi.acm.org/10.1145/2911996.2912012
Baraldi, L., Grana, C., Cucchiara, R.: Recognizing and presenting the storytelling video structure with deep multimodal networks. IEEE Trans. Multimed. 19(5), 955–968 (2017)
Bolelli, F., Borghi, G., Grana, C.: Historical handwritten text images word spotting through sliding window HOG features. In: 19th International Conference on Image Analysis and Processing (2017)
Chasanis, V.T., Likas, C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11(1), 89–100 (2009)
Hanjalic, A., Lagendijk, R.L., Biemond, J.: Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circ. Syst. Video Technol. 9(4), 580–588 (1999)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. ACM Trans. Multimed. Comput. Commun. Appl. (TOMCCAP) 2(1), 1–19 (2006)
Lin, D., Fidler, S., Kong, C., Urtasun, R.: Visual semantic search: retrieving videos via complex textual queries. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2657–2664, June 2014
Liu, C., Wang, D., Zhu, J., Zhang, B.: Learning a contextual/multi-thread model for movie/TV scene segmentation. IEEE Trans. Multimed. 15(4), 884–897 (2013)
Logan, B., et al.: Mel frequency cepstral coefficients for music modeling. In: ISMIR (2000)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Rasheed, Z., Shah, M.: Detection and representation of scenes in videos. IEEE Trans. Multimed. 7(6), 1097–1105 (2005)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using high-level audiovisual features. IEEE Trans. Circ. Syst. Video Technol. 21(8), 1163–1177 (2011)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint (2014). arXiv:1409.1556
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE (2003)
Snoek, C.G., Huurnink, B., Hollink, L., De Rijke, M., Schreiber, G., Worring, M.: Adding semantics to detectors for video retrieval. IEEE Trans. Multimed. 9(5), 975–986 (2007)
Yeung, M.M., Yeo, B.L., Wolf, W.H., Liu, B.: Video browsing using clustering and scene transitions on compressed sequences. In: IS&T/SPIE’s Symposium on Electronic Imaging: Science & Technology, pp. 399–413 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Baraldi, L., Grana, C., Cucchiara, R. (2017). A Video Library System Using Scene Detection and Automatic Tagging. In: Grana, C., Baraldi, L. (eds) Digital Libraries and Archives. IRCDL 2017. Communications in Computer and Information Science, vol 733. Springer, Cham. https://doi.org/10.1007/978-3-319-68130-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-68130-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68129-0
Online ISBN: 978-3-319-68130-6
eBook Packages: Computer ScienceComputer Science (R0)