Skip to main content
Log in

A fast audio similarity retrieval method for millions of music tracks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We present a filter-and-refine method to speed up nearest neighbor searches with the Kullback–Leibler divergence for multivariate Gaussians. This combination of features and similarity estimation is of special interest in the field of automatic music recommendation as it is widely used to compute music similarity. However, the non-vectorial features and a non-metric divergence make using it with large corpora difficult, as standard indexing algorithms can not be used. This paper proposes a method for fast nearest neighbor retrieval in large databases which relies on the above approach. In its core the method rescales the divergence and uses a modified FastMap implementation to speed up nearest-neighbor queries. Overall the method accelerates the search for similar music pieces by a factor of 10–30 and yields high recall values of 95–99% compared to a standard linear search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://www.music-ir.org/mirexwiki/

  2. See http://www.music-ir.org/mirex/200{6,7,8,9} for detailed results.

  3. http://www.mufin.com/us/faq.html

  4. http://ismir2004.ismir.net/genre_contest/index.htm

  5. http://mtg.upf.edu/ismir2004/contest/rhythmContest/

References

  1. Andoni A, Indyk P, MIT C (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th annual IEEE symposium on foundations of computer science, 2006. FOCS’06, pp 459–468

  2. Athitsos V, Alon J, Sclaroff S, Kollios G (2004) BoostMap: a method for efficient approximate similarity rankings. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, vol 2

  3. Athitsos V, Potamias M, Papapetrou P, Kollios G (2008) Nearest neighbor retrieval using distance-based hashing. In: IEEE 24th international conference on data engineering, ICDE 2008, pp 327–336

  4. Bentley J (1975) Multidimensional binary search trees used for associative searching. ACM, New York, NY, USA

    Google Scholar 

  5. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ‘nearest neighbor’ meaningful? In: Proceedings of the 7th international conference on database theory. Springer, London, UK, pp 217–235

    Google Scholar 

  6. Burges C, Platt J, Jana S (2003) Distortion discriminant analysis for audio fingerprinting. IEEE Trans Speech Audio Process 11(3):165–174

    Article  Google Scholar 

  7. Cai R, Zhang C, Zhang L, Ma W (2007) Scalable music recommendation by search. In: Proceedings of the 15th international conference on multimedia. ACM, New York, NY, USA, pp 1065–1074

    Chapter  Google Scholar 

  8. Cano P, Kaltenbrunner M, Gouyon F, Batlle E (2002) On the use of FastMap for audio retrieval and browsing. In: Proc int conf music information retrieval (ISMIR), pp 275–276

  9. Cano P, Koppenberger M, Wack N (2005) An industrial-strength content-based music recommendation system. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 673–673

    Google Scholar 

  10. Casey M, Slaney M (2006) Song intersection by approximate nearest neighbor search. In: Proc ISMIR, pp 144–149

  11. Cox T, Cox M (2001) Multidimensional scaling. CRC Press

  12. Downie JS (2008) The music information retrieval evaluation exchange (2005–2007): a window into music information retrieval research. Acoust Sci Technol 29(4):247–255

    Article  Google Scholar 

  13. Endres D, Schindelin J (2003) A new metric for probability distributions. IEEE Trans Inf Theory 49(7):1858–1860

    Article  MathSciNet  Google Scholar 

  14. Faloutsos C, Lin K (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the 1995 ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 163–174

    Chapter  Google Scholar 

  15. Fastl H, Zwicker E (2007) Psychoacoustics: facts and models. Springer, New York

    Google Scholar 

  16. Flexer A (2007) A closer look on artist filters for musical genre classification. In: Proceedings of the international symposium on music information retrieval, Vienna, Austria

  17. Flexer A, Schnitzer D (2010) Effects of album and artist filters in audio similarity computed for very large music databases. Comput Music J 34(3):20–28

    Article  Google Scholar 

  18. Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: IEEE computer society conference on computer vision and pattern recognition workshops, 2008. CVPR Workshops 2008, pp 1–6

  19. Homburg H, Mierswa I, Möller B, Morik K, Wurst M (2005) A benchmark dataset for audio classification and clustering. In: Proceedings of the international conference on music information retrieval, pp 528–31

  20. Jensen J, Christensen M, Ellis D, Jensen S (2009) Quantitative analysis of a common audio similarity measure. IEEE Trans Audio Speech Lang Process 17(4):693–703

    Article  Google Scholar 

  21. Levy M, Sandler M (2006) Lightweight measures for timbral similarity of musical audio. In: Proceedings of the 1st ACM workshop on audio and music computing multimedia. ACM, New York, NY, USA, pp 27–36

    Chapter  Google Scholar 

  22. Mandel M, Ellis D (2005) Song-level features and support vector machines for music classification. In: Proceedings of the 6th international conference on music information retrieval (ISMIR 2005), London, UK

  23. Mandel M, Ellis DP (2007) Labrosa’s audio music similarity and classification submissions. In: Proceedings of the international symposium on music information retrieval, Vienna, Austria—Mirex 2007

  24. Pampalk E (2006) Computational models of music similarity and their application in music information retrieval. Doctoral dissertation, Vienna University of Technology, Austria

  25. Pampalk E, Rauber A, Merkl D (2002) Content-based organization and visualization of music archives. In: Proceedings of the tenth ACM international conference on multimedia. ACM, New York, NY, USA, pp 570–579

    Chapter  Google Scholar 

  26. Penny W (2001) KL-divergences of Normal, Gamma, Dirichlet and Wishart densities. Wellcome Department of Cognitive Neurology, University College London

  27. Pohle T, Schnitzer D (2007) Striving for an improved audio similarity measure. In: 4th annual music information retrieval evaluation exchange

  28. Rafailidis D, Nanopoulos A, Manolopoulos Y (2009) Nonlinear dimensionality reduction for efficient and effective audio similarity searching. Multimedia Tools and Applications, pp 1–15

  29. Roy P, Aucouturier J, Pachet F, Beurive A (2005) Exploiting the tradeoff between precision and cpu-time to speed up nearest neighbor search. In: Proceedings of the 6th international conference on music information retrieval (ISMIR 2005), London, UK

  30. Schnitzer D (2007) Mirage—high-performance music similarity computation and automatic playlist generation. Master’s thesis, Vienna University of Technology

  31. Wang J, Wang X, Shasha D, Zhang K (2005) Metricmap: an embedding technique for processing distance-based queries in metric spaces. IEEE Trans Syst Man Cybern Part B Cybern 35(5):973–987

    Article  Google Scholar 

  32. Yianilos P (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA, pp 311–321

Download references

Acknowledgements

This research is supported by the Austrian Research Fund (FWF) under grant L511-N15, and by the Austrian Research Promotion Agency (FFG) under project number 815474-BRIDGE.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Schnitzer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schnitzer, D., Flexer, A. & Widmer, G. A fast audio similarity retrieval method for millions of music tracks. Multimed Tools Appl 58, 23–40 (2012). https://doi.org/10.1007/s11042-010-0679-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-010-0679-8

Keywords

Navigation