Abstract
The objective of the work reported here is to provide an automatic, context-of-capture categorization, structure detection and segmentation of news broadcasts employing a multimodal semantic based approach. We assume that news broadcasts can be described with context-free grammars that specify their structural characteristics. We propose a system consisting of two main types of interoperating units: The recognizer unit consisting of several modules and a parser unit. The recognizer modules (audio, video and semantic recognizer) analyze the telecast and each one identifies hypothesized instances of features in the audiovisual input. A probabilistic parser analyzes the identifications provided by the recognizers. The grammar represents the possible structures a news telecast may have, so the parser can identify the exact structure of the analyzed telecast.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boreczky, J.S., Wilcox, L.D.: A Hidden Markov Model Framework for Video Segmentation Using Audio and Image Features. In: proc. IEEE ICASSP, Seattle (USA) (1998)
Brand, M., Kettnaker, V.: Discovery and Segmentation of Activities in Video. IEEE Trans. Pattern Anal. Mach. Intel 22(8), 844–851 (2000)
Bruckmann, A., Lerbs, B., Gao, D., Eidtmann, J., Mozogovenko, L., Buczilowski, M., Jughardt, T., Xu, Y., Jacobs, A., Lüdtke, A.: Trecvid 2006 high level feature extraction. In: TRECVID 2006 Workshop Notebook Papers (2006)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dimitrova, N., Agnihotri, L., Wei, G.: Video classification based on HMM using text and faces. In: European Signal Processing Conference. Tampere (Finland) (2000)
Eickeler, S., Muller, S.: Content-based video indexing of TV broadcast news using hidden markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix (USA), pp. 2997–3000. IEEE Computer Society Press, Los Alamitos (1999)
Greiff, W., Morgan, A., Fish, R., Richards, M., Kundu, A.: Fine-Grained Hidden Markov Modeling for Broadcast-News Story Segmentation. In: Proceedings of the first international conference on Human language technology research, San Diego (USA), pp. 1–5 (2001)
Hoiem, D., Ke, Y., Sukthankar, R.: Solar: Sound object localization and retrieval in complex audio environments. In: Proc. of the IEEE International Conference on Acoustics, Speech and Signal (2005)
Huang, J., Kumar, R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using color correlograms. In: CVPR 1997. Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (1997)
Huang, J., Liu, Z., Wang, Y., Chen, Y., Wong, E.K.: Integration of multimodal features for video scene classification based on HMM. In: IEEE Workshop on Multimedia Signal Processing, Copenhagen (Denmark) (1999)
Ivanov, Y., Bobick, A.F.: Recognition of Visual Activities and Interactions by Stochastic Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000)
Joachims, T.: Making large-Scale SVM Learning Practical. MIT Press, Cambridge (1999)
Johnston, M.: Deixis and Conjunction in Multimodal Systems. In: Proceedings of the 18th conference on Computational linguistics, vol. 1, pp. 362–368 (2000)
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. International Journal of Lexicography 3(4), 235–244 (1990)
Moore, D., Essa, I.: Recognizing multitasked activities using stochastic context-free grammar. In: Proceedings of Workshop on Models vs Exemplars in Computer Vision (2001)
Stokes, N., Carthy, J., Smeaton, A.F.: SeLeCT: A lexical Cohesion Based News Story Segmentation System. Journal of AI Communications 17(1), 3–12 (2004)
Stolcke, A.: An efficient Probabilistic Context-Free Parsing Algorithm That Computes Prefix Probabilities. Computational Linguistics 21(2), 165–201 (1995)
Tamura, H., Mori, S., Yamawaki, T.: Textural feratures corresponding to visual perception. IEEE Trans. Syst., Man, Cyb. 8(6), 460–473 (1978)
Wilkens, N.: Detektion von Videoframes mit Texteinblendungen in Echtzeit. Diploma thesis. Universität Bremen (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jacobs, A., Ioannidis, G.T., Christodoulakis, S., Moumoutzis, N., Georgoulakis, S., Papachristoudis, Y. (2007). Automatic, Context-of-Capture-Based Categorization, Structure Detection and Segmentation of News Telecasts. In: Thanos, C., Borri, F., Candela, L. (eds) Digital Libraries: Research and Development. DELOS 2007. Lecture Notes in Computer Science, vol 4877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77088-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-77088-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77087-9
Online ISBN: 978-3-540-77088-6
eBook Packages: Computer ScienceComputer Science (R0)