Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 83))

In this chapter, we review the basic methods for audio signal processing, mainly from the point of view of audio classification. General properties of audio signals are discussed followed by a description of time-frequency representations for audio. Features useful for classification are reviewed. In addition, a discussion on prominent examples of audio classification systems with particular emphasis on feature extraction is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oppenheim A V, Lim J S (1981) The Importance of Phase in Signals. Proc of the IEEE 69(5):529-550

    Article  Google Scholar 

  2. Moore B C J (2003) An Introduction to the Psychology of Hearing. Academic, San Diego

    Google Scholar 

  3. Patterson R D (2000) Auditory Images: How Complex Sounds Are Represented in the Auditory System. J Acoust Soc Japan (E) 21(4)

    Google Scholar 

  4. Lyon R F, Dyer L (1986) Experiments with a Computational Model of the Cochlea. Proc of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  5. Martinez J M (2002) Standards - MPEG-7 overview of MPEG-7 description tools, part 2. IEEE Multimedia 9(3):83-93

    Article  Google Scholar 

  6. Xiong Z, Radhakrishnan R, Divakaran A, Huang T (2003) Comparing MFCC and MPEG-7 Audio Features for Feature Extraction, Maximum Likelihood HMM and Entropic Prior HMM for Sports Audio Classification. Proc of the International Conference on Multimedia and Expo (ICME)

    Google Scholar 

  7. Wang L, Brown G (2006) Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Wiley-IEEE, New York

    Google Scholar 

  8. McKinney M F, Breebaart J (2003) Features for Audio and Music Classification. Proc of the International Symposium on Music Information Retrieval (ISMIR)

    Google Scholar 

  9. Tzanetakis G, Cook P (2002) Musical Genre Classification of Audio Signals. IEEE Trans Speech Audio Process 10(5):293-302

    Article  Google Scholar 

  10. Burred J J, Lerch A (2004) Hierarchical Automatic Audio Signal Classification. J of Audio Eng Soc 52(7/8):724-739

    Google Scholar 

  11. Logan B (2000) Mel frequency Cepstral Coefficients for Music Modeling. Proc of the International Symposium on Music Information Retrieval (ISMIR)

    Google Scholar 

  12. Zwicker E, Scharf B (1965) A Model of Loudness Summation. Psychol Rev 72:3-26

    Article  Google Scholar 

  13. Klapuri A P (2005) A Perceptually Motivated Multiple-F0 Estimation Method for Polyphonic Music Signals. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPA)

    Google Scholar 

  14. Duda R, Hart P, Stork D (2000) Pattern Classification. Wiley, New York

    Google Scholar 

  15. El-Maleh K, Klein M, Petrucci G, Kabal P (2000) Speech/Music Discrimination for Multimedia Applications. Proc of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  16. .Williams G, Ellis D (1999) Speech/Music Discrimination based on Posterior Probability Features. Proc of Eurospeech

    Google Scholar 

  17. Scheirer E, Slaney M (1997) Construction and Evaluation of a Robust Multi-feature Speech/Music Discriminator. Proc of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  18. Chou W, Gu L (2001) Robust Singing Detection in Speech/Music Discriminator Design. Proc of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  19. Zhang T, Kuo C C J (2001) Audio Content Analysis for Online AudioVi-sual Data Segmentation and Classification. IEEE Trans on Speech and Audio Processing 9(4):441-457

    Article  Google Scholar 

  20. Wold E, Blum T, Keisler D, Wheaton J (1996) Content-based Classification, Search and Retrieval of Audio. IEEE Multimedia 3(3):27-36

    Article  Google Scholar 

  21. Peeters G, McAdams S, Herrera P (2000) Instrument Sound Description in the Context of MPEG-7. Proc of the International Computer Music Conference (ICMC)

    Google Scholar 

  22. Dowling W J (1978) Scale and Contour: Two Components of a Theory of Memory for Melodies. Psychol Rev 85:342-389

    Article  Google Scholar 

  23. Pradeep P, Joshi M, Hariharan S, Dutta-Roy S, Rao P (2007) Sung Note Segmentation for a Query-by-Humming System. Proc of the International Workshop on Artificial Intelligence and Music (Music-AI) in IJCAI

    Google Scholar 

  24. Klapuri A P (1999) Sound Onset Detection by Applying Psychoacoustic Knowl-edge. Proc of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)

    Google Scholar 

  25. de Cheveigne A, Kawahara H (1999) Multiple Period Estimation and Pitch Perception Model. Speech Communication 27:175-185

    Article  Google Scholar 

  26. Uitdenbogerd A, Zobel J (1999) Melodic Matching Techniques for Large Music Databases. Proc of the 7th ACM International Conference on Multimedia (Part 1)

    Google Scholar 

  27. Aucouturier J J, Pachet F (2004) Improving Timbre Similarity: How High is the Sky. J Negat Result Speech Audio Sci 1(1)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rao, P. (2008). Audio Signal Processing. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75398-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75397-1

  • Online ISBN: 978-3-540-75398-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics