Skip to main content
Log in

Indexation Audio: un état de I’art

State of the art in audio indexing

  • Published:
Annales Des Télécommunications Aims and scope Submit manuscript

Résumé

À l’heure actuelle, nous disposons d’une quantité d’informations audio à la fois importante et grandissante par le biais des bases de données publiques ou privées (sites Internet, cédéroms, ina, sacem) et des contenus télé et radiodiffusés. La description par mots- clés, jusqu’ici utilisée, est peu adaptée à la richesse de cette information, puisqu’elle entraîne une indexation subjective et coûteuse (è cause de l’importante intervention humaine). Le domaine de l’indexation audio tente done de répondre au besoin d’outils (semi- )automatiques de description de contenus audio afin d’en améliorer l’accès. Cet article propose un état- de- l’art de l’indexation audio, è travers la description de techniques liées à la discrimination en classes (plus ou moins grossières), ainsi qu’à la présentation des analyses spécifiques aux deux grandes classes que sont la parole et la musique (cette dernière etant largement privilégiée). Des comparatifs concernant les performances des systèmes existants y sont présentés, ainsi que l’adresse de sites Internet proposant des démonstrations.

Abstract

Nowadays, an important and growing quantity of audio information is available by means of public or private databases (Internet sites, CD- ROMs, french Audiovisual National Institute: ina, musical copyright protection associations such as sacem) and TV/radio broadcasts. Keyword description, used until now, is poorly adapted to this information, because of its subjectiveness and cost (both due to substantial human intervention). So researches in audio indexation aim to fulfil the need of (semi- )automatic tools for audio content description, in order to improve the access to audio documents. This article reviews state- of- the- art audio indexation, by describing techniques related to the discrimination of (more or less broad) classes, and by reviewing specific analyses applied to the most considered classes : speech and music (with more focus on the latter). Comparisons between the performances of existing systems are presented, as well as the addresses of the Internet sites offering demonstrations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bibliographie

  1. Baeza-Yates (R.A.), Perleberg (C.H.), “Fast and Practical Approximate String Matching,”Information Processing Letter 59, pp. 21–27, 1992.

    Article  MathSciNet  Google Scholar 

  2. Berger (K.W.), “Some Factors in the Recognition of Timbre,”Journal of the Acoustical Society of America,36 (10), pp.1888–1891, 1964.

    Article  Google Scholar 

  3. Blackburn Steven (G.), “Content Based Retrieval and Navigation of Music,” a Mini-Thesis submitted for transfer of registration from Mphil to Ph.D., University of Southampton, Faculty of Engineering and Applied Science, Department of Electronics and Computer Science, 10 March 1999.

  4. Chen James (C.C.), Chen Arbee (L.P.), “Query by Rhythm, An Approach for Song Retrieval in Music Databases,”Proceedings of 8 th Int. Workshop on Research Issues on Data Engineering, 1998.

  5. Chou (T.-C), Chen Arbee (L.P.), Llu (C.-C), “Music Databases : Indexing Techniques and Implementation,”Proceedings Int. Workshop on MultiMedia Database Management Systems, pp. 46–53, 1996.

  6. De Mori (R.), “Spoken Dialogues with Computers, ” Academic Press, 1998.

  7. Dowling (W.J.), “Scale and Contour: Two Components of a Theory of Memory for Melodies,” Psychological review,85, pp. 341–354, 1978.

    Article  Google Scholar 

  8. Ethington (R.), Punch, (B.), “SeaWave: A System for Musical Timbre Description,”Computer Music Journal,18: 1, pp. 30–39, Spring 1994.

    Article  Google Scholar 

  9. Feiten (B.), GÜnzel (S.), “Automatic Indexing of a Sound Database Using Self-organizing Neural Nets,”Computer Music Journal, pp. 53–65, Summer 1994.

  10. Foote (J.T.), “A Similarity Measure for Automatic Audio Classification,”Proceedings of AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio, Corpora. Stanford, March 1997.

  11. Foote (J.T.), “;An Overview of Audio Information Retrieval,” ACM-SpringerMultimedia Systems, In press, December 1997.

  12. Ghias (A.), Logan (H.), Chamberlin (D.), Smith (B.C.), “Query By Humming, Musical Information Retrieval in an Audio Database,”Proceedings Third International Conference on Multimedia, pp. 231–236, 1995.

  13. Gold (B.), Rabiner (L.), “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,”Journal of the Acoustical Society of America,46, pp 442–448, 1969.

    Article  Google Scholar 

  14. Grey (J. M.), “Timbre Discrimination in Musical Patterns,”Journal of the Acoustical Society of America,64 (2), pp. 467–472, 1987.

    Article  Google Scholar 

  15. Handel (S.), “Listening,” MIT press, Cambridge, Massachussets, 1989.

    Google Scholar 

  16. Hermansky (H.), Pavel (M.), Tibrewala (S.), “Towards asr using Partially Corrupted Speech,”Int. Conf. on Spoken Language Processing, pp. 458–461, October 1996.

  17. Hess (W. H.), “Pitch Determination of Speech signals”, Algorithms and devices, Heidelberg, Germany, Springer verlag 83.

  18. Junqua (J.-C.), “Robust Speech Recognition for Embedded Systems”,Workshop on Robust Methods for Speech Recognition in Adverse Conditions, May 25–26, 1999, Tampere, Finland.

  19. Kageyama (T), Mochizuki (K.), Takashima (Y.), “Melody Retrieval with Humming,”icmc’93 Tokyo proceedings, pp. 349–351.

  20. Kimber (D.), Wilcox (L.), Acoustic Segmentation for Audio Browsers,Proceedings of Interface Conference, Sydney, Australia, July 1996.

  21. Klapuri (A.), “Number Theorical Means of Resolving a Mixture of Several Harmonic Sounds,” IXeusipco, Greece, Sept. 1998.

  22. Krumhansl (C.L.), dans “Structure and perception of electroacoustic sound and music”, S. Nielzen & O. Olsson (Eds) Elsevier, Amsterdam, 1989, pp. 43–53.

    Google Scholar 

  23. Leggetter (C.J.), Woodland (P.C.), “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models”, Computer Speech and Language,9, pp. 171–185, 1995.

    Article  Google Scholar 

  24. Lemstrom (K.), Haapaniemi (A.), Ukkonen (E.) “Retrieving Music - To Index Or Not To Index,”Proceedings ACM Multimedia 98 - art demos technical demos - poster papers, pp. 64–66, Bistol, UK, September 1998.

  25. Lepain (P.), Andre-Obrecht (R.), “Micro-segmentation ’enregistrements musicaux,”Deuxiemes journees d’Informatique Musicale, laforia, 1995, pp. 81–90.

  26. Liu (Z.), Huang (J.), Wang (Y.), Chen (T.), “Audio Feature Extraction & Analysis for Scene Classification,“ieee workshop on Multimedia Signal Processing, June 23–25, Princeton, New Jersey, USA, 1997.

  27. Liu (C.-C), Hsu (J.-L.), Chen Arbee (L.P.), Efficient Theme and Non-trivial Repeating Pattern Discovering in databases,icde’99, Proceedings 15 th Int. Conf. on Data Engineering, pp. 14–21, 1999.

  28. Martin (K.D.), “A Blackboard System for Automatic Transcription of Simple Polyphonic Music,” MIT Media Laboratory Perceptual Computing Section Technical Report No. 385, 1996.

  29. Me Adams (S.), Winsberg (S.), Donnadieu (S.), De Soete (G.), Krimphoff (J.), Psychological research,58, 177–192 (1995)

    Article  Google Scholar 

  30. McNab Rodger (J.), Smith Lloyd (A.), Brainbridge (D.), Wittenian (H.), “The New Zealand Digital Library : MELody inDEX,” 1997. http://www.dlib.org/dlib/may97/meldex/05witten. html

  31. Medan (Y.), Yair (E.), Chazan (D.), “Super Resolution Pitch Determination of Speech Signals,”IEEE trans on Signal Processing assp-39 (1), pp. 40–48, 1991.

    Article  Google Scholar 

  32. Nack (F.), Lindsay (A.), “Everything you wanted to know about mpeg-7“, ieee Multimedia, 3, pp 65–77, oct. 99.

  33. Noll (P.), ”Cepstrum Pitch Determination, Journal of the Acoustical Society of America”,41 (2), pp. 293–309, 1967.

    Article  MathSciNet  Google Scholar 

  34. Ortmanns (S.), Ney (H.), A Word Graph Algorithm for Large Vocabulary Continuous Speech Recognition,Computer Speech and Language,11, pp. 43–72, 1997.

    Article  Google Scholar 

  35. Pye (D.), Hollinghurst (N.J.), Mills (T.J.), Wood (K.R.), “Audio-Visual Segmentation for Content-Based Retrieval.” ftp://ftp.uk.research.att.com/pub/docs/att/ tr.1998.15.pdf

  36. Rossignol (S.), Rodet (X.), Soumagne (J.), Collette (J.L.), Depalle (P.), “Feature Extraction and Temporal Segmentation of Acoustic Signals,“International Computer Music Conference (icmc’98), 1998.

  37. Saldanha (E. L.), Corso (J. F.), “Timbre Cues and the Identification of Musical Instruments,”Journal of the Acoustical Society of America,36 (11), pp. 2021–2026, 1964.

    Article  Google Scholar 

  38. Salosaari (P.), Jarvelin (K.), “;musir - A Retrieval Model for Music,” Technical report rn-1998-1, University of Tampere, Departement of information studies, July 1998.

  39. Saunders (J.), “Real-Time Discrimination of Broadcast Speech/Music,”Proceedings icassp’96, pp. 993–996, 1996.

  40. Scheirer (E.D.), Slaney (M.), “Construction and Evaluation of a Robust Multifeatures Speech/Music Discriminator”,ieee Transaction on Acoustics, Speech, and Signal Processing (icassp’97,), Vol. 2, pp. 1331–1334, 1997.

    Google Scholar 

  41. Sonoda (T), Goto (M.), Muraoka (Y.), “A www-based Melody Retrieval System,”Proceedings icmc98, pp. 349–352, 1998.

  42. Uitdenbogerd (A.L.), Zobel (J.), “Manipulation of Music For Melody Matching,” acm Multimedia 98, pp. 235–240, Bristol, uk., Sept. 1998.

  43. Wilcox (L.), Bush (M.), “Training and Search Algorithms for an Interactive Wordspotting System,”Proceedings icassp’92, San Francisco, 2, pp. 97–100, March 1992.

    Google Scholar 

  44. Wold (E.), Blum (T), Keislar (D.), Wheaton (J.), (Muscle Fish), “Classification, Search, and Retrieval of Audio,” crc Handbook of Multimedia Computing, 1999.

  45. Wyse (L.), Smoliar (S.W.), “Toward Content- Based Audio Indexing and Retrieval and a New Speaker Discrimination Technique,” Readings in Computational Auditory Scene Analysis, D.F. Rosenthal and H.G. Okuno, Lawrence Erlbaum, 1998

  46. Zhang (T), Kuo (C.-C.J.), “Content-Based Classification and Retrieval of Audio,”Proceedings of the spie — The Int. Soc. For Optical Engineering,3461, pp. 432–443, 1998.

    Google Scholar 

  47. Zhang (T), Kuo (C.-C.J.), “Hierarchical System for Content-Based Audio Classification and Retrieval,”Conf. on Multimedia storage and Archiving Systems III, spie,3527, pp. 398–409, November 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Matthieu Carré or Pierrick Philippe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carré, M., Philippe, P. Indexation Audio: un état de I’art. Ann. Télécommun. 55, 507–525 (2000). https://doi.org/10.1007/BF02995205

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02995205

Mots clés

Key words

Navigation