Abstract
We present our studies on the application of Coupled Hidden Markov Models(CHMMs) to sports highlights extraction from broadcast video using both audio and video information. First, we generate audio labels using audio classification via Gaussian mixture models, and video labels using quantization of the average motion vector magnitudes. Then, we model sports highlights using discrete-observations CHMMs on audio and video labels classified from a large training set of broadcast sports highlights. Our experimental results on unseen golf and soccer content show that CHMMs outperform Hidden Markov Models(HMMs) trained on audio-only or video-only observations. Next, we study how the coupling between the two single-modality HMMs offers improvement on modelling capability by making refinements on the states of the models. We also show that the number of states optimized in this fashion also gives better classification results than other number of states. We conclude that CHMMs provide a promising tool for information fusion techniques in the sports domain for audio-visual event detection and analysis.
Similar content being viewed by others
References
Kawashima T, Tateyama K, Iijima T, Aoki Y (1998) “Indexing of baseball telecast for content-based video retrieval,” In: Proceedings of the international conference on image processing, pp 871–874
Xie L, Chang SF, Divakaran A, Sun H (2002) “Structure analysis of soccer video with hidden Markov models,” In: Proceedings of the international conference on acoustic, speech, and signal processing, vol. 4, May 2004, pp 4096–4099
Xu P et al (2001) “Algorithms and system for segmentation and structure analysis in soccer video,” In: Proceedings of IEEE conference on multimedia and expo, Aug. 2001, pp 928–931
Gong Y, Sin LT, Chuan CH, Zhang H, Sakauchi M (1995) “Automatic parsing of TV soccer programs,” In: Proceedings of IEEE international conference on multimedia computing and systems, pp 167–174
Ekin A, Tekalp AM (2003) “Automatic soccer video analysis and summarization,” In: Proceedings of the international conference on electronic imaging: storage and retrieval for media databases, pp 339–350
Rui Y, Gupta A, Acero A (2000) “Automatically extracting highlights for TV baseball programs,” In: Proceedings of the 8th ACM international conference on multimedia, pp 105–115
Babaguchi N, Kawai Y, Kitahashi T (2002) Event-based indexing of broadcasted sports video by intermodal collaboration. IEEE trans multimedia 4(1):68–75
Snoek C, Worring M (2001) “Multimodal video indexing: a review of the state-of-the-art,” Tech. Rep., intelligent sensory information systems group, University of Amsterdam, Technical Report 2001-20
Hanjalic A (2003) “Generic approach to highlight detection in a sport video,” In: Proceedings of the IEEE international conference on image processing, vol. 1, Sept. 2003, pp 1–4
Chang YL, Zeng W, Kamel I, Alonso R (1996) “Integrated image and speech analysis for content-based video indexing,” In: Proceedings of the IEEE international conference multimedia computing and systems, June 1996, pp 306–313
Huang J, Liu Z, Wang Y, Chen Y, Wong EK (1999) “Integration of multimodal features for video scene classification based on HMM,” In: Proceedings of IEEE third workshop on multimedia signal processing, Sept. 1999, pp 53–58
Nepal S, Srinivasan U, Reynolds G (2001) “Automatic detection of ‘goal’ segments in basketball videos,” In: Proceedings of the ACM conference on multimedia, pp 261–269
Duan LY, Xu M, Chua TS, Tian Q, Xu CS (2003) “A mid-level representation framework for semantic sports video analysis,” In: Proceedings of ACM conference on multimedia, Nov. 2003, pp 33–44
Bolle RM, Yeo B-L, Yeung MM (1998) Video query: research directions. IBM J Res Dev 42(2):233–252
Brunelli R, Mich O, Modena CM (1999) A survey on the automatic indexing of video data. J Vis Com Image Rep 10(2):78–112
Nefian AV et al (2002) A coupled HMM for audio-visual speech recognition. In: Proceedings of international conference on acoustics speech and signal processing 2:2013–2016
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: Proceedings of the conference on computer vision and pattern recognition, June 1997, pp 994–999
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol. 77, no 2, pp 257–286
Xiong Z, Radhakrishnan R, Divakaran A, Huang TS (2003) Audio-based highlights extraction from baseball, golf and soccer games in a unified framework. In: Proceedings of the international conference on acoustic, speech and signal processing 5:628–631
Peker KA, Cabasson R, Divakaran A (2002) Rapid generation of sports highlights using the mpeg-7 motion activity descriptor. In: Proceedings of the SPIE conference on storage and retrieval from media databases 4676:318–323
Young S et al. (2003) The HTK BOOK VERSION 3.2. Cambridge University Press, Cambridge
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiong, Z. Audio-visual sports highlights extraction using Coupled Hidden Markov Models. Pattern Anal Applic 8, 62–71 (2005). https://doi.org/10.1007/s10044-005-0244-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-005-0244-7