Abstract.
We present a multimedia information analysis framework for content-based browsing of video. Specifically, we develop algorithms for the automated extraction of video highlights in sports video that are based on audio, text, and image features. The extracted annotations are used to build applications for selective browsing of sports videos. Such summarization techniques enable content-based indexing of multimedia documents for efficient storage and retrieval. In addition, in the context of the newly emerging standard MPEG-7, these methods will enable applications that use MPEG-7 descriptions. As this standard provides only the syntax for representing such descriptions and not specific algorithms for extracting them, these algorithms are of great value for establishing MPEG-7 as an accepted standard. We provide experimental results for the proposed algorithms on several hours of sports programs that prove the feasibility of efficient video access techniques in a multimedia environment.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Abdel-Mottaleb M, Dimitrova N, Desai R, Martino J (1996) CONIVAS: CONtent-based image and video access system. In: Proceedings of ACM Multimedia’ 96, Boston, 18-22 November 1996, pp 427-428
Alatan A, Akansu A, Wolf W (2001) Multimodal dialogue scene detection using Hidden Markov Models for content-based multimedia indexing. Multimedia Tools Appl 14(2):137-151
Assfalg J, Bertini M, Colombo C, Del Bimbo A (2002) Semantic annotation of sports videos. IEEE Multimedia 9(2):52-60
Babaguchi N, Sasamori S, Kitahashi T, Jain R (1999) Detecting events from continuous media by intermodal collaboration and knowledge use. In: Proceedings of the IEEE international conference on multimedia computing and systems, Florence, Italy, 1-7 June 1999, pp 782-786
Babaguchi N, Kawai Y, Kitahashi T (2002) Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans Multimedia 4(1):782-786
Brown GJ, Cooke M (1994) Computational auditory scene analysis. Comput Speech Lang (8):297-236
Brown M, Foote J, Jones G, Sparck-Jones K, Young S (1995) Automatic content-based retrieval of broadcast news. In: Proceedings of ACM Multimedia 1995, San Francisco, 5-9 November 1995, pp 35-43
Chang SF, Chen W, Meng HJ, Sundaram H, Zhong D (1997) VideoQ - an automatic content-based video search system using visual cues. In: Proceedings of ACM Multimedia, Seattle, November 1997, pp 313-324
Chen J-Y, Taskiran C, Delp EJ, Bouman CA (1998) ViBE: a new paradigm for video database browsing and search. In: Proceedings of the workshop on content-based access of image and video libraries (in conjunction with CVPR’98), Santa Barbara, CA, June 1998, pp 96-100
Colombo C, Del Bimbo A, Pala P (1999) Semantics in visual information retrieval. In: Proceedings of IEEE Multimedia, 6(3):38-53
Dagtas S, Abdel-Mottaleb M (2001) Extraction of TV highlights using multimedia features. In: Proceedings of the IEEE workshop on multimedia signal processing, Cannes France, 3-5 October 2001, pp 91-96
Eickeler S, Muller S (1999) Content-based video indexing of TV broadcast news using hidden Markov models. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Phoenix, AZ, 15-19 March 1999, pp 2997-3000
El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Istanbul, Turkey, 5-9 June 2000, pp 2445-2448
Flickner M(1996) Query by image and video content: the QBIC system. IEEE Comput 28(9):23-32
Ghias A, Logan J, Chamberlin D, Smith BC (1995) Query by humming - musical information retrieval in an audio database. In: Proceedings of ACM Multimedia, San Francisco, 5-9 November 1995, pp 231-236
Gong Y, Sin LT, Chuan CH, Zhang H, Sakauchi M (1995) Automatic parsing of TV soccer programs. In: Proceedings of the international conference on multimedia computing and systems (ICMCS ‘99), Washington, DC, 15-18 May 1995, pp 167-174
Gunsel B, Ferman M, Tekalp M (1996) Video indexing through integration of syntactic and semantic features. In: Proceedings of the 3rd IEEE workshop on applications of computer vision, Sarasota, FL, 2-4 December 1996, pp 90-95
Hampapur A, Gupta A, Horowitz B, Shu CF, Fuller C, Bach J, Gorkani M, Jain R (1997) Virage video engine. In: Proceedings of SPIE: Storage and Retrieval for Image and Video Databases V, San Jose, CA, February 1997, pp 188-197
Hauptmann AG, Lee D, Kennedy PE (1999) Topic labeling of multilingual broadcast news in the informedia digital video library. In: Proceedings of the ACM DL/ SIGIR MIDAS Workshop, Berkeley, CA, 14 August 1999, pp 287-288
Huang J, Lu Z, Wang Y, Chen Y, Wong EK (1999) Integration of multimodal features for video scene classification based on HMM. In: Proceedings of the IEEE workshop on multimedia signal processing, Copenhagen, Denmark, 13-15 September 1999, pp 53-58
Ma WY, Manjunath BS (1997) Netra: a toolbox for navigating large image databases. In: Proceedings of the IEEE international conference on image processing, Santa Barbara, CA, October 1997, 1:568-571
Martin KD (1999) Sound-source recognition: a theory and computational model. Ph.D. thesis, MIT, Cambridge, MA, June 1999
Mehrotra S, Rui Y, Ortega M, Huang TS (1997) Supporting content-based queries over images in MARS. In: Proceedings of the IEEE international conference on multimedia computing and systems, Ontario, Canada, 3-6 June 1997, pp 632-633
Minam K, Akutsu A, Hamada H, Tomomura Y (1998) Video handling with music and speech detection. In: Proceedings of IEEE Multimedia 5(3):17-25
Naphade MR, Huang TS (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141-151
Patel NV, Sethi K (1996) Audio characterization for video indexing. In: Proceedings of SPIE on storage and retrieval for still image and video databases, San Jose, CA, 28 January-2 February 1996, 2670:373-384
Patel NV, Sethi K (1997) Video classification using speaker identification. In: Proceedings of IS&T SPIE, Storage and Retrieval for Image and Video Databases IV, San Jose, CA, 8-14 February 1997, pp 218-225
Peker AK, Alatan AA, Akansu AN (2000) Low-level motion activity features for semantic characterization of video. In: Proceedings of the IEEE international conference on multimedia and expo, New York, 30 July-2 August 2000, 2:801-804
Pentland A, Picard RW, Sclaroff S (1994) Photobook: content-based manipulation of image databases. In: Proceedings of SPIE Storage Retrieval Image Video Databases II, San Jose, CA, 6-10 February 1994, 2185:34-47
Pfeiffer S, Fischer S, Effelsberg W (1996) Automatic audio content analysis. In: Proceedings of ACM Multimedia 1996, Boston, 18-22 November 1996, pp 21-30
Pfeiffer S, Lienhart R, Effelsberg W (2001) Scene determination based on video and audio features. Multimedia Tools Appl 15(1):59-81
Picard RW, Minka TP (1995) Vision texture for annotation. Multimedia Sys 3:3-14
Qian R, Tovinkere V (2001) Detecting semantic events in soccer games: towards a complete solution. In: Proceedings of the IEEE international conference on multimedia and expo, Tokyo, 22-26 August 2001, pp 833-836
Rui Y, Grupta A, Acero A (2000) Automatically extracting highlights for TV baseball programs. In: Proceedings of ACM Multimedia, Los Angeles October 2000, pp 105-115
Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimedia 6(1):22-35
Smith JR, Chang SF (1996) Visualseek: a fully automated content-based image query system. In: Proceedings of ACM Multimedia, Boston, November 1996, pp 87-98
Smith MA, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of CVPR 1997, San Juan, Puerto Rico, 17-19 June 1997, pp 775-781
Sudhir G, Lee JCM, Jain AK (1998) Automatic classification of tennis video for high-level content-based retrieval. In: Proceedings of the IEEE international workshop on content-based access of image and video databases, in conjunction with ICCV’98, Bombay, India, 3 January 1998, pp 81-90
Toklu C, Liou S, Das M (2000) Video abstract: a hybrid approach to generate semantically meaningful video summaries. In: Proceedings of the 1st IEEE international conference on multimedia and expo (ICME), New York, 30 July-2 August 2000, 3:1333-1336
Truong BT, Venkatesh S, Dorai C (2000) Automatic genre identification for content-based video categorization. In: Proceedings of the IEEE international conference on pattern recognition, Barcelona, Spain, 3-8 September 2000, pp 4230-4233
Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. In: Proceedings of IEEE Multimedia 3(3):27-36
Zhang H, Tan S, Smoliar S, Yihong G (1995) Automatic parsing and indexing of news video. Multimedia Sys 2(6):256-266
Zhou W, Vellaikal A, Kuo C (2000) Rule-based video classification system for basketball video indexing. In: Proceedings of ACM Multimedia 2000, Los Angeles, 30 October-4 November 2000, pp 213-216
Zhu W, Toklu C, Liou S (2001) Automatic news video segmentation and categorization based on closed-captioned text. In: Proceedings of the IEEE international conference on multimedia and expo, Tokyo, 22-25 August 2001, pp 1036-1039
Author information
Authors and Affiliations
Corresponding author
Additional information
Serhan Dagtas: Correspondence to:
Rights and permissions
About this article
Cite this article
Dagtas, S., Abdel-Mottaleb, M. Multimodal detection of highlights for multimedia content. Multimedia Systems 9, 586–593 (2004). https://doi.org/10.1007/s00530-003-0130-3
Issue Date:
DOI: https://doi.org/10.1007/s00530-003-0130-3