skip to main content
10.1145/2505515.2505658acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

SRbench--a benchmark for soundtrack recommendation systems

Published:27 October 2013Publication History

ABSTRACT

In this work, a benchmark to evaluate the retrieval performance of soundtrack recommendation systems is proposed. Such systems aim at finding songs that are played as background music for a given set of images. The proposed benchmark is based on preference judgments, where relevance is considered a continuous ordinal variable and judgments are collected for pairs of songs with respect to a query (i.e., set of images). To capture a wide variety of songs and images, we use a large space of possible music genres, different emotions expressed through music, and various query-image themes. The benchmark consists of two types of relevance assessments: (i) judgments obtained from a user study, that serve as a ``gold standard'' for (ii) relevance judgments gathered through Amazon's Mechanical Turk. We report on the performance of two state-of-the-art soundtrack recommendation systems using the proposed benchmark.

References

  1. O. Alonso and R. A. Baeza-Yates. Design and implementation of relevance assessments using crowdsourcing. ECIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Alonso, R. Schenkel, and M. Theobald. Crowdsourcing assessments for XML ranked retrieval. ECIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. ECIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Carterette and P. N. Bennett. Evaluation measures for preference judgments. SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Carterette, P. N. Bennett, D. M. Chickering, and S. T. Dumais. Here or there: preference judgments for relevance. ECIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Carterette and D. Petkova. Learning a ranking from pairwise preferences. SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Fechner. Elemente der Psychophysik. Breitkopf und Haertel, 1860.Google ScholarGoogle Scholar
  8. Psychpage - General list of feelings. http://www.psychpage.com/learning/library/assess/feelings.html.Google ScholarGoogle Scholar
  9. Wikipedia - List of music genres. http://en.wikipedia.org/wiki/List_of_popular_music_genres.Google ScholarGoogle Scholar
  10. ImageCLEF - Image Retrieval in CLEF. http://www.imageclef.org/.Google ScholarGoogle Scholar
  11. Wikipedia - List of photograpy forms. http://en.wikipedia.org/wiki/Photography.Google ScholarGoogle Scholar
  12. R. Janicki. Ranking with partial orders and pairwise comparisons. RSKT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kazai, N. Milic-Frayling, and J. Costello. Towards methods for the collective gathering and quality control of relevance assessments. SIGIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. J. Lang, M. M. Bradley, and B. N. Cuthbert. International affective picture system (iaps): Affective ratings of pictures and instruction manual. Technical report, University of Florida, 2008.Google ScholarGoogle Scholar
  16. Last.Fm - Music portal. http://www.last.fm/.Google ScholarGoogle Scholar
  17. C.-T. Li and M.-K. Shan. Emotion-based impressionism slideshow with automatic music accompaniment. ACM Multimedia, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. A. Mason and D. J. Watts. Financial incentives and the "performance of crowds". KDD Workshop on Human Computation, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. MIREX - The Music Information Retrieval Evaluation eXchange. http://www.music-ir.org/mirex/wiki/MIREX_HOME.Google ScholarGoogle Scholar
  20. Amazon Mechanical Turk. https://www.mturk.com/mturk/welcome.Google ScholarGoogle Scholar
  21. Picasa - Photo sharing portal. https://picasaweb.google.com/.Google ScholarGoogle Scholar
  22. M. E. Rorvig. The simple scalability of documents. JASIS, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. A. Russell. A circumplex model of affect. Journal of personality and social psychology, 1980.Google ScholarGoogle Scholar
  24. M. Sanderson, M. L. Paramita, P. Clough, and E. Kanoulas. Do user preferences and evaluation measures line up? SIGIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast - but is it good? evaluating non-expert annotations for natural language tasks. EMNLP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Stupar and S. Michel. Picasso - to sing, you must close your eyes and draw. SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Stupar and S. Michel. Benchmarking Soundtrack Recommendation Systems with SRBench. CoRR, abs/1308.1224, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Thomas and D. Hawking. Evaluation by comparing result sets in context. CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Thurstone. A law of comparative judgments. Psychological Review, 1927.Google ScholarGoogle ScholarCross RefCross Ref
  30. TREC - Text REtrieval Conference. http://trec.nist.gov/.Google ScholarGoogle Scholar
  31. TRECVID - TREC Video Retrieval Evaluation. http://trecvid.nist.gov/.Google ScholarGoogle Scholar
  32. R. Typke, M. den Hoed, J. de Nooijer, F. Wiering, and R. C. Veltkamp. A ground truth for half a million musical incipits. JDIM, 2005.Google ScholarGoogle Scholar
  33. R. Typke, R. C. Veltkamp, and F. Wiering. A measure for evaluating retrieval techniques based on partially ordered ground truth lists. ICME, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Urbano, M. Marrero, D. Martín, and J. Lloréns. Improving the generation of ground truths based on partially ordered lists. ISMIR, 2010.Google ScholarGoogle Scholar

Index Terms

  1. SRbench--a benchmark for soundtrack recommendation systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
      October 2013
      2612 pages
      ISBN:9781450322638
      DOI:10.1145/2505515

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader