Skip to main content

Audio Content Analysis for Understanding Structures of Scene in Video

  • Conference paper
Intelligent Computing (ICIC 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4113))

Included in the following conference series:

Abstract

In this paper, we propose a system to categorize audio in 7 classes. For classification features, we use the mean and variance of RMS, ZCR, fundamental frequency and frequency peak which are extracted from every frame of 25ms length. In addition to the audio content classification, we also perform speaker identification with the voice sequences extracted automatically using our proposed method. The accuracy of our proposed scheme reaches 93.8% in categorizing audio signal and 80% in the speaker identification process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baek, J.S., Lee, S.T., Baek, J.H.: Scene Boundary Detection by Audiovisual Contents Analysis. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 530–539. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Kim, H.G., Moreau, N., Sikora, T.: MEPG-7 Audio and Beyond Audio Content Indexing and Retrieval. Wiley, Chichester (2005)

    Book  Google Scholar 

  3. Zhang, T., Jay, K.C.-C.: Audio Content Analysis for Online Audiovisual Data Segmentation and Classification. Speech and Audio Processing IEEE Transactions 9, 441–457 (2001)

    Article  Google Scholar 

  4. Panagiotakis, C., Tziritas, G.: A Speech/music Discriminator Based on RMS and Zero-Crossings. IEEE transactions on Multimedia 7, 155–166 (2005)

    Article  Google Scholar 

  5. Quatieri, T.: Discrete-time Speech Signal Processing Principles and Practice. Prentice Hall PTR, Englewood Cliffs (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kang, CM., Baek, JH. (2006). Audio Content Analysis for Understanding Structures of Scene in Video. In: Huang, DS., Li, K., Irwin, G.W. (eds) Intelligent Computing. ICIC 2006. Lecture Notes in Computer Science, vol 4113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816157_151

Download citation

  • DOI: https://doi.org/10.1007/11816157_151

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-37271-4

  • Online ISBN: 978-3-540-37273-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics