Multimodal genre classification of TV programs and YouTube videos

Ekenel, Hazım Kemal; Semela, Tomas

doi:10.1007/s11042-011-0923-x

Multimodal genre classification of TV programs and YouTube videos

Published: 24 November 2011

Volume 63, pages 547–567, (2013)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hazım Kemal Ekenel¹ &
Tomas Semela¹

1999 Accesses
22 Citations
Explore all metrics

Abstract

This paper presents an automatic video genre classification system, which utilizes several low level audio-visual features as well as cognitive and structural information, and in case of web videos tag-based features, to classify the types of TV programs and YouTube videos. Classification is performed using an ensemble of support vector machines. The visual descriptors consist of color and texture-based features, which are often used to represent the concepts appearing in a video. The audio descriptors are signal energy, zero crossing rate, fundamental frequency, and mel-frequency cepstral coefficients representing a wide range of perceptual cues available in the audio signal. Cognitive descriptors correspond to the information derived from a face detector, whereas structural descriptors are related to shot editing of the video. Tag descriptor is used additionally for the genre classification of YouTube videos and it is based on term frequency-inverse document frequency measure. For each feature and type of genre a separate support vector machine classifier is trained following the one-vs-all scheme. The outputs of the classifiers are then combined to yield the final classification result. The proposed system is extensively evaluated using complete TV programs from Italian RAI TV channel, from a French TV channel, and videos from YouTube on which using only the audio-visual cues as well as cognitive and structural information, 99.2, 94.5 and 87.3% correct classification rates are attained, respectively. These results show that the developed system can reliably determine TV programs’ genre. Incorporating tag feature to the content-based features increases the YouTube genre classification performance from 87.3 to 89.7%. Further experiments indicate that the quality of videos does not influence the results significantly. It is found that the performance drop in classifying genres of YouTube videos is mainly due to the large variety of content contained in these videos. In summary, this study shows that the proposed low level visual feature set, which we have used to represent the concepts appearing in a video, also provides robust cues for genre classification. In addition, obtained genre information is expected to provide additional cues which can be used to improve the concept detection system’s performance. It has also been shown that ensemble of support vector machine classifiers outperforms neural network based classification proposed in the previous state-of-the-art genre classification systems (Montagnuolo and Messina, AIIA, LNAI 4733:730–741, 2007, Multimed Tools Appl 41(1):125–159, 2009). Besides the improvement in the employed feature set and classification scheme, the experimental framework of the study is exemplary with the extensive tests conducted on different domains ranging from TV programs from different countries to web videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

Article 04 August 2018

Ashu Kumar, Amandeep Kaur & Munish Kumar

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Article 25 September 2020

Preksha Pareek & Ankit Thakkar

Notes

Courtesy of RAI, displayed for research purposes, all rights are reserved.
Courtesy of INA, displayed for research purposes, all rights are reserved.

References

Borth D et al (2009) TubeFiler—an automatic web video categorizer. In: Proc. of ACM multimedia. Beijing, China, pp 1111–1112
Cao J, Zhang YD, Song YC, Chen ZN, Zhang X, Li JT (2009) MCG-WEBV: a benchmark dataset for web video analysis. Technical Report, ICT-MCG-09-001, Institute of Computing Technology
Campbell M et al (2006) IBM research TRECVID-2006 video retrieval system. In: Proc. of NIST TRECVID workshop, Gaithersburg, USA
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Ekenel HK, Fischer M, Gao H, Kilgour K, Marcos JS, Stiefelhagen R (2007) Universität Karlsruhe (TH) at TRECVID 2007. In: Proc. of NIST TRECVID workshop, Gaithersburg, MD
Ekenel HK, Gao H, Stiefelhagen R (2008) Universität Karlsruhe (TH) at TRECVID 2008. In: Proc. of NIST TRECVID workshop, Gaithersburg, MD
Fischer S, Lienhart R, Effelsberg W (1995) Automatic recognition of film genres. In: Proc. of ACM multimedia, San Francisco, USA, pp 295–304
Huang J, Kumar SR, Mitra M, Zhu W-J, Zabih R (1997) Image indexing using color correlograms. In: Proc. IEEE conf. computer vision and pattern recognition, San Juan, pp 762–768
Lin H-T, Lin C-J, Weng RC (2007) A note on Platt’s probabilistic outputs for support vector machines. Mach Learn 68(3):267–276
Article Google Scholar
Lu L, Zhang H-J, Li SZ (2003) Content-based audio classification and segmentation by using support vector machines. Multimed Syst 8(6):482–492
Article Google Scholar
Montagnuolo M, Messina A (2007) TV genre classification using multimodal information and multilayer perceptrons. AIIA, LNAI 4733:730–741
Google Scholar
Montagnuolo M, Messina A (2008) Fuzzy mining of multimedia genre applied to television archives. In: Proc. of IEEE intl. conference on multimedia and expo, pp 117–120
Montagnuolo M, Messina A (2009) Parallel neural networks for multimodal video genre classification. Multimed Tools Appl 41(1):125–159
Article Google Scholar
Multimedia Grand Challenge (2009, 2010) http://comminfo.rutgers.edu/conferences/mmchallenge/
Quaero Programme website (2011) http://www.quaero.org/
Saunders J (1996) Real-time discrimination of broadcast speech/music. In: Proceedings of the acoustics, speech, and signal processing conference, Washington, pp 993–996
Song Y, Zhang Y, Yhang X, Cao J, Li J (2009) Google challenge: incremental-learning for web video categorization on robust semantic feature space. In: Proc. of ACM multimedia, Beijing, China, pp 1113–1114
Song Y, Zhao M, Yagnik J, Wu X (2010) Taxonomic Classification for Web-based Videos. In: Proc. of computer vision and pattern recognition (CVPR), pp 871–878
Stricker M, Orengo M (1995) Similarity of color images. In: Proc. SPIE storage and retrieval for image and video databases, vol 2420, San Jose, USA, pp 381–392
Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32
Article Google Scholar
Talkin D (1995) A robust algorithm for pitch tracking (RAPT). In: Speech coding & synthesis, pp 495–518
Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302
Article Google Scholar
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
VOICEBOX Speech processing toolbox for MATLAB (2011) http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
Wang J, Xu C, Chng E (2006) Automatic sports video genre classification using Pseudo-2D-HMM. In: Proc. of intl. conf. on pattern recognition, Washington DC, USA, pp 778–781
Wang Z, Zhao M, Song Y, Kumar S, Li B (2010) YouTubeCat: learning to categorize wild web videos. In: Proc. of computer vision and pattern recognition (CVPR), pp 879–886
Wu T-F, Lin C-J, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005
MathSciNet MATH Google Scholar
Wu X, Zhao WL, Ngo CW (2009) Towards google challenge: combining contextual and social information for web video categorization. In: Proc. of ACM multimedia, Beijing, China, pp 1109–1110
Yang L, Liu J, Yang X, Hua XS (2007) Multi-modality web video categorization. In: Proc. of multimedia information retrieval, MIR ’07, Augsburg, Germany, pp 265–274

Download references

Acknowledgements

The authors would like to thank Alberto Messina and Maurizio Montagnuolo from RAI Centre for Research and Technological Innovation for their contributions to the study and for providing the TV program data. The authors would also like to thank INA (French National Audiovisual Institute) for providing the corpus used in Quaero evaluations. This study is funded by OSEO, French State agency for innovation, as part of the Quaero Programme.

Author information

Authors and Affiliations

Institute of Anthropomatics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Hazım Kemal Ekenel & Tomas Semela

Authors

Hazım Kemal Ekenel
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Semela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hazım Kemal Ekenel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ekenel, H.K., Semela, T. Multimodal genre classification of TV programs and YouTube videos. Multimed Tools Appl 63, 547–567 (2013). https://doi.org/10.1007/s11042-011-0923-x

Download citation

Published: 24 November 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s11042-011-0923-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal genre classification of TV programs and YouTube videos

Abstract

Access this article

Similar content being viewed by others

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal genre classification of TV programs and YouTube videos

Abstract

Access this article

Similar content being viewed by others

Real-Time Human Pose Detection and Recognition Using MediaPipe

Face detection techniques: a review

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation