Abstract
This paper discusses open problems of detecting shot boundaries for music videos. The number of shots per second and the type of transition are considered to be a discriminating feature for music videos and a potential multi-modal music feature. By providing an extensive list of effects and transition types that are rare in cinematic productions but common in music videos, we emphasize the artistic use of transitions in music videos. By the use of examples we discuss in detail the shortcomings of state-of-the-art approaches and provide suggestions to address these issues.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Schindler, A., Rauber, A.: A music video information retrieval approach to artist identification. In: Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research, CMMR 2013, Marseille, France, 14–18 October 2013 (2013, to appear)
Schindler, A., Rauber, A.: Harnessing music-related visual stereotypes for music information retrieval. ACM Trans. Intell. Syst. Technol. 8(2), 20:1–20:21 (2016)
Tripathi, S., Acharya, S., Sharma, R.D., Mittal, S., Bhattacharya, S.: Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset. In: Twenty-Ninth IAAI Conference, pp. 4746–4752 (2017)
Macrae, R., Anguera, X., Oliver, N.: MuViSync: realtime music video alignment. In: 2010 IEEE International Conference on Multimedia and Expo, ICME, pp. 534–539. IEEE (2010)
Slizovskaia, O., Gómez, E., Haro, G.: Musical instrument recognition in user-generated videos using a multimodal convolutional neural network architecture. In: Proceedings of the ACM on International Conference on Multimedia Retrieval, ICMR 2017, pp. 226–232 (2017)
Schindler, A.: A picture is worth a thousand songs: exploring visual aspects of music. In: Proceedings of the 1st International Workshop on Digital Libraries for Musicology, DLfM 2014 (2014)
Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text, and images using deep features. CoRR, abs/1707.04916 (2017)
Schindler, A., Rauber, A.: An audio-visual approach to music genre classification through affective color features. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 61–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_8
Iyengar, G., Lippman, A.B.: Models for automatic classification of video sequences. In: Storage and Retrieval for Image and Video Databases VI, vol. 3312, pp. 216–228. International Society for Optics and Photonics (1997)
Hampapur, A., Weymouth, T., Jain, R.: Digital video segmentation. In: Proceedings of the 2nd ACM International Conference on Multimedia, pp. 357–364. ACM (1994)
Cotsaces, C., Nikolaidis, N., Pitas, I.: Video shot detection and condensed representation. A review. IEEE Signal Process. Mag. 23(2), 28–37 (2006)
Yuan, J., et al.: A formal study of shot boundary detection. IEEE Trans. Circ. Syst. Video Technol. 17(2), 168–186 (2007)
Smeaton, A.F., Over, P., Doherty, A.R.: Video shot boundary detection: seven years of TRECVID activity. Comput. Vis. Image Underst. 114(4), 411–418 (2010)
Lienhart, R.W.: Reliable dissolve detection. In: Storage and Retrieval for Media Databases, vol. 4315, pp. 219–231. International Society for Optics and Photonics (2001)
Zheng, W., Yuan, J., Wang, H., Lin, F., Zhang, B.: A novel shot boundary detection framework. In: Visual Communications and Image Processing, vol. 5960, p. 596018. International Society for Optics and Photonics (2006)
Cernekova, Z., Pitas, I., Nikou, C.: Information theory-based shot cut/fade detection and video summarization. IEEE Trans. Circ. Syst. Video Technol. 16(1), 82–91 (2006)
Xia, D., Deng, X., Zeng, Q.: Shot boundary detection based on difference sequences of mutual information. In: Fourth International Conference on Image and Graphics, ICIG 2007, pp. 389–394. IEEE (2007)
M Quśenot, G., Moraru, D., Besacier, L.: CLIPS at TRECVID: shot boundary detection and feature detection (2003)
Zhao, Z.-C., Zeng, X., Liu, T., Cai, A.-N.: BUPT at TRECVID 2007: shot boundary detection. In: TRECVID (2007)
Boreczky, J.S., Wilcox, L.D.: A hidden Markov model framework for video segmentation using audio and image features. In: ICASSP, vol. 98, pp. 3741–3744 (1998)
Amir, A., et al.: IBM research TRECVID-2003 video retrieval system. NIST TRECVID-2003 7(8), 36 (2003)
Hauptmann, A., et al.: Confounded expectations: Informedia at TRECVID 2004. In: Proceedings of TRECVID (2004)
Baraldi, L., Grana, C., Cucchiara, R.: Hierarchical boundary-aware neural encoder for video captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3185–3194. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Schindler, A., Rauber, A. (2019). On the Unsolved Problem of Shot Boundary Detection for Music Videos. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-05710-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05709-1
Online ISBN: 978-3-030-05710-7
eBook Packages: Computer ScienceComputer Science (R0)