Abstract
Rapid growth in storage technology and data acquisition has significantly increased the volume of multimedia data online. A challenging problem is to analyze that multimedia data which are in massive quantity. In recent years, indexing of video files based on contents has gained increased popularity in research community. There are also attempts at identifying if a video clip is containing a specific genre of video, e.g., an sports video, movie, drama, animation or talk show. These proposed techniques, however, use a long list of audio visual features in achieving this classification task, which obviously decreases processing efficiency. Based on certain patterns in audio visual features and basic grammar of talk show, this research differentiates multimedia contents of talk shows from rest of the video genres. Our multimodal rule-based classification approach exploits shots and scenes in a video as classification features. The contents from popular multimedia databases like DailyMotion, YouTube and movies from Hollywood and Bollywood are used as dataset to test overall system of genre identification. The system achieves precision and recall of 98% and 100%, respectively, on 600 selected videos of more than 600 h of duration to classify multimedia content as either ‘TalkShow’ or ‘OtherVideo’ category.
Similar content being viewed by others
References
Amraee, S., Vafaei, A., Jamshidi, K., Adibi, P.: Abnormal event detection in crowded scenes using one-class SVM. Signal Image Video Process. 12(6), 1115–1123 (2018)
Chen, M., Chen, S.C., Shyu, M.L., Zhang, C.: Video event mining via multimodal content analysis and classification. In: Multimedia Data Mining and Knowledge Discovery, pp. 234–258. Springer (2007)
Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J.: Deep neural networks segment neuronal membranes in electron microscopy images. In: Advances in Neural Information Processing Systems, pp. 2843–2851 (2012)
Domnic, S.: Walsh-hadamard transform kernel-based feature vector for shot boundary detection. IEEE Trans. Image Process. 23(12), 5187–5197 (2014)
e Souza, M.R., Pedrini, H.: Combination of local feature detection methods for digital video stabilization. Signal Image Video Process. 12(8), 1513–1521 (2018)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Harb, H., Chen, L., Auloge, J.Y.: Speech/music/silence and gender detection algorithm. In: Proceedings of the 7th International conference on Distributed Multimedia Systems DMS01. Citeseer (2001)
Kar, T., Kanungo, P.: Video shot boundary detection based on Hilbert and wavelet transform. In: 2017 2nd International Conference on Man and Machine Interfacing (MAMI), pp. 1–6. IEEE (2017)
Karpathy, A., et al.: Large scale video classification with convolutionalneural networks. In: Computer Vision and Pattern Recognition (CVPR), p. 1725. IEEE (2014)
Kawai, Y., Sumiyoshi, H., Yagi, N.: Shot boundary detection at TRECVID 2007. In: TRECVID. Citeseer (2007)
Kim, Y.T., Chua, T.S.: Retrieval of news video using video sequence matching. In: 11th International Multimedia Modelling Conference, pp. 68–75. IEEE (2005)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22(5), 533–544 (2001)
Li, Y., Narayanan, S., Kuo, C.C.J.: Content-based movie analysis and indexing based on audiovisual cues. IEEE Trans. Circuits Syst. Video Technol. 14(8), 1073–1085 (2004)
Li, Z., Liu, X., Zhang, S.: Shot boundary detection based on multilevel difference of colour histograms. In: 2016 First International Conference on Multimedia and Image Processing (ICMIP), pp. 15–22. IEEE (2016)
Liu, H.Y., Zhang, H.: A sports video browsing and retrieval system based on multimodal analysis: Sportsbr. In: 2005 International Conference on Machine Learning and Cybernetics, vol. 8, pp. 5077–5081. IEEE (2005)
Lu, L., Zhang, H.J., Li, S.Z.: Content-based audio classification and segmentation by using support vector machines. Multimed. Syst. 8(6), 482–492 (2003)
Mondal, J., Kundu, M.K., Das, S., Chowdhury, M.: Video shot boundary detection using multiscale geometric analysis of nsct and least squares support vector machine. Multimed. Tools Appl. 77(7), 8139–8161 (2018)
Montagnuolo, M., Messina, A.: Parallel neural networks for multimodal video genre classification. Multimed. Tools Appl. 41(1), 125–159 (2009)
Panagiotakis, C., Tziritas, G.: A speech/music discriminator based on RMS and zero-crossings. IEEE Trans. Multimed. 7(1), 155–166 (2005)
Peng, Y., Ngo, C.W.: EMD-based video clip retrieval by many-to-many matching. In: International Conference on Image and Video Retrieval, pp. 71–81. Springer (2005)
Pikrakis, A., Giannakopoulos, T., Theodoridis, S.: A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks. IEEE Trans. Multimed. 10(5), 846–857 (2008)
Pingping, C., Guan, Y., Ding, X., Yu, Z.: Shot boundary detection with sparse presentation. In: 2016 IEEE 13th International Conference on Signal Processing (ICSP), pp. 900–904. IEEE (2016)
Sahoo, P., Kanungo, P., Mishra, S.: A fast valley-based segmentation for detection of slowly moving objects. Signal Image Video Process. 12(7), 1265–1272 (2018)
Saunders, J.: Real-time discrimination of broadcast speech/music. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 2, pp. 993–996. IEEE (1996)
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 1331–1334. IEEE (1997)
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813 (2014)
Shirahama, K., Uehara, K.: Query by shots: retrieving meaningful events using multiple queries and rough set theory. In: Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008, pp. 43–52. ACM (2008)
Smeaton, A.F., Over, P., Doherty, A.R.: Video shot boundary detection: seven years of trecvid activity. Comput. Vis. Image Underst. 114(4), 411–418 (2010)
Song, B.C., Ra, J.B.: Automatic shot change detection algorithm using multi-stage clustering for mpeg-compressed videos. J. Vis. Commun. Image Represent. 12(3), 364–385 (2001)
Supreeth, H., Patil, C.M.: Efficient multiple moving object detection and tracking using combined background subtraction and clustering. Signal Image Video Process. 12(6), 1097–1105 (2018)
Truong, B.T., Dorai, C.: Automatic genre identification for content-based video categorization. In: Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 4, pp. 230–233. IEEE (2000)
Wu, S., Zhong, S., Liu, Y.: Deep residual learning for image steganalysis. Multimed. Tools Appl. 77(9), 10437–10453 (2018)
Yazdi, M., Fani, M.: Shot boundary detection with effective prediction of transitions’ positions and spans by use of classifiers and adaptive thresholds. In: 2016 24th Iranian Conference on Electrical Engineering (ICEE), pp. 167–172. IEEE (2016)
Zeng, S., Lu, G., Yan, P.: Enhancing human action recognition via structural average curves analysis. Signal Image Video Process. 12(8), 1551–1558 (2018)
Zhang, M., Li, W., Du, Q.: Diverse region-based CNN for hyperspectral image classification. IEEE Trans. Image Process. 27(6), 2623–2634 (2018)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Daudpota, S.M., Muhammad, A. & Baber, J. Video genre identification using clustering-based shot detection algorithm. SIViP 13, 1413–1420 (2019). https://doi.org/10.1007/s11760-019-01488-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-019-01488-3