Skip to main content
Log in

Multimodal detection of highlights for multimedia content

  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract.

We present a multimedia information analysis framework for content-based browsing of video. Specifically, we develop algorithms for the automated extraction of video highlights in sports video that are based on audio, text, and image features. The extracted annotations are used to build applications for selective browsing of sports videos. Such summarization techniques enable content-based indexing of multimedia documents for efficient storage and retrieval. In addition, in the context of the newly emerging standard MPEG-7, these methods will enable applications that use MPEG-7 descriptions. As this standard provides only the syntax for representing such descriptions and not specific algorithms for extracting them, these algorithms are of great value for establishing MPEG-7 as an accepted standard. We provide experimental results for the proposed algorithms on several hours of sports programs that prove the feasibility of efficient video access techniques in a multimedia environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdel-Mottaleb M, Dimitrova N, Desai R, Martino J (1996) CONIVAS: CONtent-based image and video access system. In: Proceedings of ACM Multimedia’ 96, Boston, 18-22 November 1996, pp 427-428

  2. Alatan A, Akansu A, Wolf W (2001) Multimodal dialogue scene detection using Hidden Markov Models for content-based multimedia indexing. Multimedia Tools Appl 14(2):137-151

    Article  Google Scholar 

  3. Assfalg J, Bertini M, Colombo C, Del Bimbo A (2002) Semantic annotation of sports videos. IEEE Multimedia 9(2):52-60

    Article  Google Scholar 

  4. Babaguchi N, Sasamori S, Kitahashi T, Jain R (1999) Detecting events from continuous media by intermodal collaboration and knowledge use. In: Proceedings of the IEEE international conference on multimedia computing and systems, Florence, Italy, 1-7 June 1999, pp 782-786

  5. Babaguchi N, Kawai Y, Kitahashi T (2002) Event based indexing of broadcasted sports video by intermodal collaboration. IEEE Trans Multimedia 4(1):782-786

    Article  Google Scholar 

  6. Brown GJ, Cooke M (1994) Computational auditory scene analysis. Comput Speech Lang (8):297-236

    Article  Google Scholar 

  7. Brown M, Foote J, Jones G, Sparck-Jones K, Young S (1995) Automatic content-based retrieval of broadcast news. In: Proceedings of ACM Multimedia 1995, San Francisco, 5-9 November 1995, pp 35-43

  8. Chang SF, Chen W, Meng HJ, Sundaram H, Zhong D (1997) VideoQ - an automatic content-based video search system using visual cues. In: Proceedings of ACM Multimedia, Seattle, November 1997, pp 313-324

  9. Chen J-Y, Taskiran C, Delp EJ, Bouman CA (1998) ViBE: a new paradigm for video database browsing and search. In: Proceedings of the workshop on content-based access of image and video libraries (in conjunction with CVPR’98), Santa Barbara, CA, June 1998, pp 96-100

  10. Colombo C, Del Bimbo A, Pala P (1999) Semantics in visual information retrieval. In: Proceedings of IEEE Multimedia, 6(3):38-53

  11. Dagtas S, Abdel-Mottaleb M (2001) Extraction of TV highlights using multimedia features. In: Proceedings of the IEEE workshop on multimedia signal processing, Cannes France, 3-5 October 2001, pp 91-96

  12. Eickeler S, Muller S (1999) Content-based video indexing of TV broadcast news using hidden Markov models. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Phoenix, AZ, 15-19 March 1999, pp 2997-3000

  13. El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/music discrimination for multimedia applications. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Istanbul, Turkey, 5-9 June 2000, pp 2445-2448

  14. Flickner M(1996) Query by image and video content: the QBIC system. IEEE Comput 28(9):23-32

    Google Scholar 

  15. Ghias A, Logan J, Chamberlin D, Smith BC (1995) Query by humming - musical information retrieval in an audio database. In: Proceedings of ACM Multimedia, San Francisco, 5-9 November 1995, pp 231-236

  16. Gong Y, Sin LT, Chuan CH, Zhang H, Sakauchi M (1995) Automatic parsing of TV soccer programs. In: Proceedings of the international conference on multimedia computing and systems (ICMCS ‘99), Washington, DC, 15-18 May 1995, pp 167-174

  17. Gunsel B, Ferman M, Tekalp M (1996) Video indexing through integration of syntactic and semantic features. In: Proceedings of the 3rd IEEE workshop on applications of computer vision, Sarasota, FL, 2-4 December 1996, pp 90-95

  18. Hampapur A, Gupta A, Horowitz B, Shu CF, Fuller C, Bach J, Gorkani M, Jain R (1997) Virage video engine. In: Proceedings of SPIE: Storage and Retrieval for Image and Video Databases V, San Jose, CA, February 1997, pp 188-197

  19. Hauptmann AG, Lee D, Kennedy PE (1999) Topic labeling of multilingual broadcast news in the informedia digital video library. In: Proceedings of the ACM DL/ SIGIR MIDAS Workshop, Berkeley, CA, 14 August 1999, pp 287-288

  20. Huang J, Lu Z, Wang Y, Chen Y, Wong EK (1999) Integration of multimodal features for video scene classification based on HMM. In: Proceedings of the IEEE workshop on multimedia signal processing, Copenhagen, Denmark, 13-15 September 1999, pp 53-58

  21. Ma WY, Manjunath BS (1997) Netra: a toolbox for navigating large image databases. In: Proceedings of the IEEE international conference on image processing, Santa Barbara, CA, October 1997, 1:568-571

  22. Martin KD (1999) Sound-source recognition: a theory and computational model. Ph.D. thesis, MIT, Cambridge, MA, June 1999

  23. Mehrotra S, Rui Y, Ortega M, Huang TS (1997) Supporting content-based queries over images in MARS. In: Proceedings of the IEEE international conference on multimedia computing and systems, Ontario, Canada, 3-6 June 1997, pp 632-633

  24. Minam K, Akutsu A, Hamada H, Tomomura Y (1998) Video handling with music and speech detection. In: Proceedings of IEEE Multimedia 5(3):17-25

  25. Naphade MR, Huang TS (2001) A probabilistic framework for semantic video indexing, filtering, and retrieval. IEEE Trans Multimedia 3(1):141-151

    Article  Google Scholar 

  26. Patel NV, Sethi K (1996) Audio characterization for video indexing. In: Proceedings of SPIE on storage and retrieval for still image and video databases, San Jose, CA, 28 January-2 February 1996, 2670:373-384

  27. Patel NV, Sethi K (1997) Video classification using speaker identification. In: Proceedings of IS&T SPIE, Storage and Retrieval for Image and Video Databases IV, San Jose, CA, 8-14 February 1997, pp 218-225

  28. Peker AK, Alatan AA, Akansu AN (2000) Low-level motion activity features for semantic characterization of video. In: Proceedings of the IEEE international conference on multimedia and expo, New York, 30 July-2 August 2000, 2:801-804

  29. Pentland A, Picard RW, Sclaroff S (1994) Photobook: content-based manipulation of image databases. In: Proceedings of SPIE Storage Retrieval Image Video Databases II, San Jose, CA, 6-10 February 1994, 2185:34-47

  30. Pfeiffer S, Fischer S, Effelsberg W (1996) Automatic audio content analysis. In: Proceedings of ACM Multimedia 1996, Boston, 18-22 November 1996, pp 21-30

  31. Pfeiffer S, Lienhart R, Effelsberg W (2001) Scene determination based on video and audio features. Multimedia Tools Appl 15(1):59-81

    Article  MATH  Google Scholar 

  32. Picard RW, Minka TP (1995) Vision texture for annotation. Multimedia Sys 3:3-14

    Google Scholar 

  33. Qian R, Tovinkere V (2001) Detecting semantic events in soccer games: towards a complete solution. In: Proceedings of the IEEE international conference on multimedia and expo, Tokyo, 22-26 August 2001, pp 833-836

  34. Rui Y, Grupta A, Acero A (2000) Automatically extracting highlights for TV baseball programs. In: Proceedings of ACM Multimedia, Los Angeles October 2000, pp 105-115

  35. Satoh S, Nakamura Y, Kanade T (1999) Name-it: naming and detecting faces in news videos. IEEE Multimedia 6(1):22-35

    Article  Google Scholar 

  36. Smith JR, Chang SF (1996) Visualseek: a fully automated content-based image query system. In: Proceedings of ACM Multimedia, Boston, November 1996, pp 87-98

  37. Smith MA, Kanade T (1997) Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of CVPR 1997, San Juan, Puerto Rico, 17-19 June 1997, pp 775-781

  38. Sudhir G, Lee JCM, Jain AK (1998) Automatic classification of tennis video for high-level content-based retrieval. In: Proceedings of the IEEE international workshop on content-based access of image and video databases, in conjunction with ICCV’98, Bombay, India, 3 January 1998, pp 81-90

  39. Toklu C, Liou S, Das M (2000) Video abstract: a hybrid approach to generate semantically meaningful video summaries. In: Proceedings of the 1st IEEE international conference on multimedia and expo (ICME), New York, 30 July-2 August 2000, 3:1333-1336

  40. Truong BT, Venkatesh S, Dorai C (2000) Automatic genre identification for content-based video categorization. In: Proceedings of the IEEE international conference on pattern recognition, Barcelona, Spain, 3-8 September 2000, pp 4230-4233

  41. Wold E, Blum T, Keislar D, Wheaton J (1996) Content-based classification, search, and retrieval of audio. In: Proceedings of IEEE Multimedia 3(3):27-36

  42. Zhang H, Tan S, Smoliar S, Yihong G (1995) Automatic parsing and indexing of news video. Multimedia Sys 2(6):256-266

    Google Scholar 

  43. Zhou W, Vellaikal A, Kuo C (2000) Rule-based video classification system for basketball video indexing. In: Proceedings of ACM Multimedia 2000, Los Angeles, 30 October-4 November 2000, pp 213-216

  44. Zhu W, Toklu C, Liou S (2001) Automatic news video segmentation and categorization based on closed-captioned text. In: Proceedings of the IEEE international conference on multimedia and expo, Tokyo, 22-25 August 2001, pp 1036-1039

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Serhan Dagtas.

Additional information

Serhan Dagtas: Correspondence to:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dagtas, S., Abdel-Mottaleb, M. Multimodal detection of highlights for multimedia content. Multimedia Systems 9, 586–593 (2004). https://doi.org/10.1007/s00530-003-0130-3

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-003-0130-3

Keywords

Navigation