Skip to main content
Log in

Drum Loops Retrieval from Spoken Queries

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Recent efforts in audio indexing and music information retrieval mostly focus on melody. If this is appropriate for polyphonic music signals, specific approaches are needed for systems dealing with percussive audio signals such as those produced by drums, tabla or djembé. In this article, we present a complete system allowing the management of a drum patterns (or drumloops) database. Queries in this database are formulated with spoken onomatopoeias—short meaningless words imitating the different sounds of the drumkit. The transcription task necessary to index the database is performed using Hidden Markov Models (HMM) and Support Vector Machines (SVM) and achieves a 86.4% correct recognition rate. The syllables of spoken queries are recognized and a relevant statistical model allows the comparison and alignment of the query with the rythmic sequences stored in the database, in order to provide a set of the most relevant drum loops.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alonso, M., David, B., and Richard, G. (2003). A Study of Tempo Tracking Algorithms from Polyphonic Music Signals. In Proceddings of 4th COST276 Workshop, Bordeaux, France.

  • Alonso, M., David, B., and Richard, G. (2004). Tempo and Beat Estimation of Musical Signals. In Proceedings of the 5th International Symposium on Music Information Retrieval (ISMIR2004), Barcelona, Spain.

  • Byrd, D. and Crawford, T. (2002). Problems of Music Information Retrieval in the Real World. Information Processing and Management, 38, 249–272.

    Google Scholar 

  • Chen, J.C.C. and Chen, A.L.P. (1998). Query by Rhythm: An Approach for Sond Retrieval in Music Databases. In Proceedings of the 8th IEEE Workshop on Research Issues on Data Engineering (RIDE1998), Orlando, Florida, pp. 139–146.

  • Downie, S. (2003). The mir/mdl evaluation project white paper collection. http://music-ir.org/evaluation/wp.html.

  • Essid, S., Richard, G., and David, B. (2004). Musical Instrument Recognition on solo Performances. In Proceedings of the 12th European Conference on Signal Processing (EUSIPCO2004), Vienna, Austria.

  • FitzGerald, D., Coyle, E., and Lawlor, B. (2002). Sub-band Independent Subspace Analysis for Drum Transcription. In Proceedings of the 5th International Conference on Digital Audio Effects (DAFX’02), Hamburg, Germany.

  • Forney, G. D. (1973). The viterbi algorithm. In Proc. IEEE, pp. 268–278.

  • Ghias, A., Logan, J., Chamberlin, D., and Smith, B. (1995). Query by Humming: Musical Information Retrieval in an Audio Database. In Proceedings of the 3rd ACM International Conference on Multimedia (ACM Multimedia’95), San Francisco, California.

  • Gillet, O. and Richard, G. (2003). Automatic Labelling of Tabla Signals. In Proceedings of the 4th International Symposium on Music Information Retrieval (ISMIR2003), Baltimore, Maryland.

  • Gillet, O. and Richard, G. (2004). Automatic Transcription of Drum Loops. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2004), Montreal, Quebec.

  • Goto, M. (2001). An Audio-Based Real-Time Beat Tracking System for Music with or Without Drum-Sounds. Journal of New Music Research, 30(2), 159–171.

    Google Scholar 

  • Herrera, P., Amatriain, X., Battle, E., and Serra, X. (2000). Towards Instrument Segmentation for Music Content Description: A Critical Review of Instrument Classification Techniques. In Proceedings of the 1st International Symposium on Music Information Retrieval (ISMIR2000), Plymouth, Massachusetts.

  • Herrera, P., Dehamel, A., and Gouyon, F. (2003). Automatic Labeling of Unpitched Percussion Sounds. In Proceedings of the 114th Audio Engineering Society Convention (AES’2003), Amsterdam, The Netherlands.

  • Jin, H. and Jagadish, H. (2002). Johny Can’t sing: A Comprehensive Error Model for Sung Music Queries. In Proceedings of the 3rd International Symposium on Music Information Retrieval (ISMIR2002), Paris, France.

  • Klapuri, A. (1999). Sound onset Detection by Applying Psychoacoustic Knowledge. In Proceedings of the 1999 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP1999), Phoenix, Arizona.

  • Kornstädt, A. (1998). Themefinder: A Web-Based Melodic Search Tool. Computing in Musicology, 11, 231–236.

    Google Scholar 

  • Laroche, J. (2003). Efficient Tempo and Beat Tracking in Audio Recordings. Journ. of Audio. Eng. Soc., 51(4).

  • McDonald, S. and Tsang, C. (1997). Percussive sound Identification Using Spectral Centroid Trajectories. In Proceedings of the 1997 Postgraduate Research Conference, University of Western Australia.

  • McNab, R., Smith, L., Bainbridge, D., and Witten, I. (1997). The New Zealand Digital Library Melody Index. D-Lib Magazine. http://www.dlib.org/dlib/may97/meldex/05witten.html.

  • Partridge, M. and Jabri, M. (2000). Robust principal component analysis. In Proceedings of the 2000 IEEE Signal Processing Society Workshop, Sydney, Australia, pp. 289–298.

  • Patel, A. and Iversen, J. (2003). Acoustic and Perceptual Comparison of Speech and Drum Sounds in the North Indian Tabla Tradition: An Empirical Study of Sound Symbolism. In Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, Spain.

  • Paulus, J. and Klapuri, A. (2003). Conventional and Periodic n-Grams in the Transcription of Drum Sequences. In Proceedings of the 2003 IEEE International Conference on Multimedia and Expo (ICME2003), Baltimore, Maryland.

  • Rabiner, L. and Juang, B. (1993). Fundamentals of Speech Recognition. NJ: Englewood Cliffs.

    Google Scholar 

  • Raphael, C. (2002). A Hybrid Graphical Model for Rhythmic Parsing. Artificial Intelligence, 137, 217–238.

    Google Scholar 

  • Rolland, P., Raskinis, G., and Ganascia, J. (1999). Musical Content-Based Retrieval: An Overview of the Melodiscov Approach and System. In Proceedings of the 7th ACM International Conference on Multimedia (ACM Multimedia’99), Orlando, Florida, pp. 81–84.

  • Scheirer, E. (1998). Tempo and beat analysis of acoustic musical signals. Journal of the Acoustical Society of America, 103(1), 588–601.

    Google Scholar 

  • Shalev-Shwartz, S., Dubnov, S., Friedman, N., and Singer, Y. (2002). Robust Temporal and Spectral Modeling for Query by Melody. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’2002), Tampere, Finland.

  • Shifrin, J., Pardo, B., Meek, C., and Birmingham, W. (2002). Hmm-Based Musical Query Retrieval. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL2002), Portland, Oregon.

  • Sillanpää, J., Klapuri, A., Seppänen, J., and Virtanen, T. (2000). Recognition of acoustic noise mixtures by combined bottom-up and top-down approach. In Proceedings of the 10th European Conference on Signal Processing (EUSIPCO2000), Tampere, Finland.

  • Sonoda, T., Goto, M., and Muraoka, Y. (1998). A www-Based Melody Retrieval System. In Proceedings of the 1998 International Computer Music Conference (ICMC’98), Ann Arbor, Michigan, pp. 349–352.

  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer-Verlag.

  • Viterbi, A.J. (1967). Error Bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm. In IEEE Trans. Informat. Theory, pp. 260–269.

  • Zils, A., Pachet, F., Delerue, O., and Gouyon, F. (2002). Automatic Extraction of Drum Tracks from Polyphonic Music Signals. In Proceedings of the 2nd International Conference on Web Delivering of Music (WEDELMUSIC’2002), Darmstadt, Germany.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gaël Richard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gillet, O., Richard, G. Drum Loops Retrieval from Spoken Queries. J Intell Inf Syst 24, 159–177 (2005). https://doi.org/10.1007/s10844-005-0321-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-005-0321-9

Keywords

Navigation