Skip to main content
Log in

Nearest-Neighbor Automatic Sound Annotation with a WordNet Taxonomy

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Sound engineers need to access vast collections of sound effects for their film and video productions. Sound effects providers rely on text-retrieval techniques to give access to their collections. Currently, audio content is annotated manually, which is an arduous task. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or limited sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. A general sound recognition tool would require first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. We report experimental results on a general sound annotator. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Barnard, K. et al. (2003). Matching Words and Pictures. Journal of Machine Learning Research, {3}, 1107–1135.

    Google Scholar 

  • Cano, P. et al. (2004a). Perceptual and Semantic Management of Sound Effects with a WordNet-based Taxonomy. In Proc. of the ICETE, Setúbal, Portugal.

  • Cano, P. et al. (2004b). Nearest-Neighbor Generic Sound Classification with a WordNet-based Taxonomy. In Proc.116th AES Convention, Berlin, Germany.

  • Cano, P. et al. (2004c). Sound Effects Taxonomy Management in Production Environments. In Proc. AES 25th Int. Conf., London, UK.

  • Casey, M. (2002). Generalized Sound Classification and Similarity in MPEG-7. Organized Sound, {6}(2).

  • Dubnov, S. and Ben-Shalom, A. (2003). Review of ICA and HOS Methods for Retrieval of Natural Sounds and Sound Effects. In 4th IInt. Symposium on Independent Component Analysis and Blind Signal Separation, Japan.

  • Gygi, B. (2001). Factors in the Identification of Environmental Sounds. Ph.D. Thesis, Indiana University.

  • Herrera, P., et al. (2002). Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques. In C. Anagnostopoulou, M. Ferrand, and A. Smaill (Eds.), Music and Artificial Intelligence. Springer.

  • Herrera, P. et al. (2003). Automatic Classification of Musical Instrument Sounds. Journal of New Music Research, {32}(1).

  • Jain, A.K. et al. (2000). Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, {22}(1), 4–37.

    Google Scholar 

  • Kostek, B. and Czyzewski, A. (2001). Representing Musical Instrument Sounds for Their Automatic Classification. J. Audio Eng. Soc. {49}(9), 768–785.

    Google Scholar 

  • Lakatos, S. (2000). A Common Perceptual Space for Harmonic and Percussive Timbres. Perception & Psychoacoustics, (62), 1426–1439.

    Google Scholar 

  • Logan, B. (2000). Mel Frequency Cepstral Coefficients for Music Modeling. In Proc. of the ISMIR, Plymouth, MA.

  • Martin, K.D. (1999). Sound-Source Recognition: A Theory and Computacional Model. Ph.D. Thesis, M.I.T.

  • Miller, G.A. (1995). WordNet: A Lexical Database for English. Communications of the ACM, 39–45.

  • Mott, R.L. (1990). Sound Effects: Radio, TV, and Film. Focal Press.

  • Peeters, G. and Rodet, X. (2003). Hierarchical Gaussian Tree with Inertia Ratio Maximization for the Classification of Large Musical Instruments Databases. In Proc. of the 6th Int. Conf. on Digital Audio Effects, London.

  • Peltonen, V. et al. (2002). Computational Auditory Scene Recognition. In Proc. of ICASSP, Florida, USA.

  • Slaney, M. (2002). Mixture of Probability Experts for Audio Retrieval and Indexing. In IEEE IInt. Conference on Multimedia and Expo.

  • Wold, E. et al. (1996). Content-Based Classification, Search, and Retrieval of Audio. IEEE Multimedia, {3}(3), 27–36.

    Google Scholar 

  • Zhang, T. and Kuo, C.-C. J. (1999). Classification and Retrieval of Sound Effects in Audiovisual Data Management. In Proc. of the 33rd Asilomar Conference on Signals, Systems and Computers.

  • Zwicker, E. and Fastl, H. (1990). Psychoacoustics, Facts and Models. Springer-Verlag.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pedro Cano.

Additional information

Part of the contents of this paper has been published in the Proceedings of the 2004 IEEE International Workshop on Machine Learning for Signal Processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cano, P., Koppenberger, M., Groux, S.L. et al. Nearest-Neighbor Automatic Sound Annotation with a WordNet Taxonomy. J Intell Inf Syst 24, 99–111 (2005). https://doi.org/10.1007/s10844-005-0318-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-005-0318-4

Keywords

Navigation