Model of Auditory Filters and MPEG-7 Descriptors in Sound Recognition

Świercz, Aneta; Żera, Jan

doi:10.1007/978-3-319-09912-5_17

Aneta Świercz¹⁹ &
Jan Żera¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8610))

Included in the following conference series:

International Conference on Active Media Technology

2338 Accesses

Abstract

It was examined whether applying a model of human auditory filter could improve the quality of sound recognition with the use of MPEG-7 standard audio descriptors. Modeling of filtering in the auditory system was with a bank of 38 gammatone filters closely spaced across the audible frequency range. The bank of filters was implemented as a low-level audio descriptor to replace the short-term Fourier transform (STFT) MPEG-7 audio descriptor. Sound recognition tests were conducted on a large set of sounds of nine musical instruments and speech of twelve speakers. The results showed that the proposed descriptor employing a bank of gammatone filters led to improved recognition of musical instruments and speakers as compared to the STFT-based original low-level MPEG-7 audio descriptor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buchner, E.: Linia rozwoju życia, PWM (1994) (in Polish)
Google Scholar
Casey, M.: General sound classification and similarity in MPEG-7. Organised Sound 6(2), 153–164 (2001)
Article MathSciNet Google Scholar
Dau, T., Puschel, D., Kohlrausch, A.: A quantitative model of the “effective” signal processing in the auditory system, I. Model structure. J. Acoust. Soc. Am. 99(6), 3615–3622 (1996)
Article Google Scholar
Dau, T., Puschel, D., Kohlrausch, A.: A quantitative model of the “effective” signal processing in the auditory system. II. Simulations and measurements. J. Acoust. Soc. Am. 99(6), 3623–3631 (1996)
Article Google Scholar
Hartmann, W.M.: Signals, Sound, and Sensation, AIP Series in modern acoustics and signal processing. Springer, New York (2000)
Google Scholar
Irino, T., Patterson, R.D.: A time-domain. level-dependent auditory filter: the gammachirp. J. Acoust. Soc. Am. 101, 412–419 (1997)
Article Google Scholar
ISO/IEC FDIS 15938-4:2001(E), Information Technology - Multimedia Content Description Interface - Part 4: Audio (2001)
Google Scholar
Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7 Multimedia Content Description Interface. Wiley (2002)
Google Scholar
Patterson, R.D., Allerhand, M., Giguere, C.: Time-domain modelling of peripheral auditory processing: A modular architecture and a software platform. J. Acoust. Soc. Am. 98, 1890–1894 (1995)
Article Google Scholar
Pruszewicz, A., Demenko, G., Richter, A., Wika, T.: Nowe listy artykulacyjne do badań audiometrycznych. Logopedia 20, 139 (1993) (in Polish )
Google Scholar
Siedlaczek, P.: Peter Siedlaczek’s Advanced Orchestra, Recordings CD Vol. 1-5
Google Scholar
Tchorz, J., Kollmeier, B.: A model of auditory perception as front end for automatic speech recognition, J. Acoust. Soc. Am. 106(4), 2040–2050 (1999)
Article Google Scholar
Unoki, M., Irino, T., Glasberg, B., Moore, B.C.J., Patterson, R.D.: Comparison of the roex and gammachirp filters as representation of the auditory filter. J. Acoust. Soc. Am. 120, 1474–1492 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Radioelectronics, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665, Warsaw, Poland
Aneta Świercz & Jan Żera

Authors

Aneta Świercz
View author publications
You can also search for this author in PubMed Google Scholar
Jan Żera
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Warsaw and Infobright Inc., Poland
Dominik Ślȩzak
Department of Computer Science, Loughborough University, Loughborough, U.K.
Gerald Schaefer
Computer Science Department, University of British Columbia, 2366 Main Mall, P.O. Box, Vancouver, B.C., Canada
Son T. Vuong
Department of Information & Communication Engineering, Inha University, Korea
Yoo-Sung Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Świercz, A., Żera, J. (2014). Model of Auditory Filters and MPEG-7 Descriptors in Sound Recognition. In: Ślȩzak, D., Schaefer, G., Vuong, S.T., Kim, YS. (eds) Active Media Technology. AMT 2014. Lecture Notes in Computer Science, vol 8610. Springer, Cham. https://doi.org/10.1007/978-3-319-09912-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-09912-5_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09911-8
Online ISBN: 978-3-319-09912-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics