Audio Content Analysis for Understanding Structures of Scene in Video

Kang, Chan-Mi; Baek, Joong-Hwan

doi:10.1007/11816157_151

Chan-Mi Kang¹⁹ &
Joong-Hwan Baek²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4113))

Included in the following conference series:

International Conference on Intelligent Computing

1611 Accesses
1 Citations

Abstract

In this paper, we propose a system to categorize audio in 7 classes. For classification features, we use the mean and variance of RMS, ZCR, fundamental frequency and frequency peak which are extracted from every frame of 25ms length. In addition to the audio content classification, we also perform speaker identification with the voice sequences extracted automatically using our proposed method. The accuracy of our proposed scheme reaches 93.8% in categorizing audio signal and 80% in the speaker identification process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baek, J.S., Lee, S.T., Baek, J.H.: Scene Boundary Detection by Audiovisual Contents Analysis. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 530–539. Springer, Heidelberg (2005)
Chapter Google Scholar
Kim, H.G., Moreau, N., Sikora, T.: MEPG-7 Audio and Beyond Audio Content Indexing and Retrieval. Wiley, Chichester (2005)
Book Google Scholar
Zhang, T., Jay, K.C.-C.: Audio Content Analysis for Online Audiovisual Data Segmentation and Classification. Speech and Audio Processing IEEE Transactions 9, 441–457 (2001)
Article Google Scholar
Panagiotakis, C., Tziritas, G.: A Speech/music Discriminator Based on RMS and Zero-Crossings. IEEE transactions on Multimedia 7, 155–166 (2005)
Article Google Scholar
Quatieri, T.: Discrete-time Speech Signal Processing Principles and Practice. Prentice Hall PTR, Englewood Cliffs (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia Retrieval Lab. in School of Electronics and Communication Engineering, Hankuk Aviation University,
Chan-Mi Kang
School of Electronics and Communication Engineering, Hankuk Aviation University,
Joong-Hwan Baek

Authors

Chan-Mi Kang
View author publications
You can also search for this author in PubMed Google Scholar
Joong-Hwan Baek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui, China
De-Shuang Huang
Carnegie Mellon University,
Kang Li
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Stranmillis Road, BT9 5AH, Belfast, UK
George William Irwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kang, CM., Baek, JH. (2006). Audio Content Analysis for Understanding Structures of Scene in Video. In: Huang, DS., Li, K., Irwin, G.W. (eds) Intelligent Computing. ICIC 2006. Lecture Notes in Computer Science, vol 4113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816157_151

Download citation

DOI: https://doi.org/10.1007/11816157_151
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37271-4
Online ISBN: 978-3-540-37273-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics