HMM-Based Audio Keyword Generation

Xu, Min; Duan, Ling-Yu; Cai, Jianfei; Chia, Liang-Tien; Xu, Changsheng; Tian, Qi

doi:10.1007/978-3-540-30543-9_71

HMM-Based Audio Keyword Generation

Min Xu¹⁹,
Ling-Yu Duan²⁰,
Jianfei Cai¹⁹,
Liang-Tien Chia¹⁹,
Changsheng Xu²⁰ &
…
Qi Tian²⁰

Conference paper

1368 Accesses
45 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3333))

Abstract

With the exponential growth in the production creation of multimedia data, there is an increasing need for video semantic analysis. Audio, as a significant part of video, provides important cues to human perception when humans are browsing and understanding video contents. To detect semantic content by useful audio information, we introduce audio keywords which are sets of specific audio sounds related to semantic events. In our previous work, we designed a hierarchical Support Vector Machine (SVM) classifier for audio keyword identification. However, a weakness of our previous work is that audio signals are artificially segmented into 20 ms frames for frame-based SVM identification without any contextual information. In this paper, we propose a classification method based on Hidden Markov Modal (HMM) for audio keyword identification as an improved work instead of using hierarchical SVM classifier. Choosing HMM is motivated by the successful story of HMM in speech recognition. Unlike the frame-based SVM classification followed by major voting, our proposed HMM-based classifiers treat specific sound as a continuous time series data and employ hidden states transition to capture context information. In particular, we study how to find an effective HMM, i.e., determining topology, observation vectors and statistical parameters of HMM. We also compare different HMM structures with different hidden states, and adjust time series data with variable length. Experimental data includes 40 minutes basketball au-dio which comes from real-time sports games. Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM.

This research is partially supported by Singapore A*STAR SERC Grant (032 101 0006).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gong, Y.H., Sin, L.T., Chuan, C.H., Zhang, H.J., Sakauchi, M.: Automatic parsing of TV soccer programs. In: International Conference on Multimedia Computing and System, pp. 167–174 (1995)
Google Scholar
Tan, Y.P., Saur, D.D., Kulkarni, S.R., Ramadge, P.J.: Rapid estimation of camera motion from compressed video with application to video annotation. IEEE Trans. on Circuits and Systems for Video Technology 10, 133–146 (2000)
Article Google Scholar
Xu, P., Xie, L., Chang, S.F., Divakaran, A., Vetro, A., Sun, H.: Algorithms and systems for segmentation and structure analysis in soccer video. In: IEEE International Conference on Multimedia and Expo., pp. 22–25 (2001)
Google Scholar
Duan, L.Y., Xu, M., Chua, T.S., Tian, Q., Xu, C.S.: A mid-level representation framework for semantic sports video analysis. ACM Multimedia (2003)
Google Scholar
Nepal, S., Srinivasan, U., Reynolds, G.: Automatic detection of goal segments in basketball videos. ACM Multimedia (2001)
Google Scholar
Han, M., Hua, W., Xu, W., Gong, Y.H.: An integrated baseball digest system using maximum entropy method. ACM Multimedia, 347–350 (2002)
Google Scholar
Xu, M., Duan, L.Y., Xu, C.S., Kankanhalli, M., Tian, Q.: Event detection in basketball video using multiple modalities. In: IEEE Pacific Rim Conference on Multimedia 2003 (2003)
Google Scholar
Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for TV baseball programs. ACM Multimedia, 105–115 (2000)
Google Scholar
Xu, M., Maddage, N.C., Xu, C., Kankanhalli, M., Tian, Q.: Creating audio keywords for event detection in soccer video. In: IEEE International Conference on Multimedia and Expo., pp. 6–9 (2003)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Pan, H., Beek, P., Sezan, M.I.: Detection of slow-motion replay segments in sports video for highlights generation. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1649–1652 (2001)
Google Scholar
Assfalg, J., Bertini, M., Bimbo, A.D., Nunziati, W., Pala, P.: Soccer highlights detection and recognition using HMMs. In: IEEE International Conference on Multi-media and Expo., pp. 825–828 (2002)
Google Scholar
Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Audio events detection based highlights extraction from baseball, golf and soccer games in a unified framework. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. V-632 – V-635 (2003)
Google Scholar
Xie, L., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognition Letters 25, 767–775 (2004)
Article Google Scholar
Young, S., et al.: The HTK Book (for HTK Version 3.1). Cambridge University Engineering Department (2002), http://htk.eng.cam.edu/

Download references

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, 639798
Min Xu, Jianfei Cai & Liang-Tien Chia
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore, 119613
Ling-Yu Duan, Changsheng Xu & Qi Tian

Authors

Min Xu
View author publications
You can also search for this author in PubMed Google Scholar
Ling-Yu Duan
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Cai
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Tien Chia
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, 113-8656, Tokyo, Japan
Kiyoharu Aizawa
Tokyo Research Laboratory, IBM Research, 1623-14 Shimo-tsuruma, Yamato, 242-0001, Kanagawa, Japan
Yuichi Nakamura
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, M., Duan, LY., Cai, J., Chia, LT., Xu, C., Tian, Q. (2004). HMM-Based Audio Keyword Generation. In: Aizawa, K., Nakamura, Y., Satoh, S. (eds) Advances in Multimedia Information Processing - PCM 2004. PCM 2004. Lecture Notes in Computer Science, vol 3333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30543-9_71

Download citation

DOI: https://doi.org/10.1007/978-3-540-30543-9_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23985-7
Online ISBN: 978-3-540-30543-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics