Skip to main content

Triplet Convolutional Network for Music Version Identification

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

Abstract

Music version identification has long been a difficult task in the music information retrieval field, due to the variations in tempo, key and structure. Most existing methods use hand-crafted features, which require extensive human efforts and expert participants to design the feature structures and further breakthrough is hardly achievable. Therefore, we propose a triplet convolutional embedding network for version identification, learning feature representations for music automatically in a supervised way. Triplet convolutional networks can learn segment-level features from training data, focusing on the most similar parts between music versions, rather than on the song-level. Furthermore, we compare triplet-based learning with pair-based learning. Our approach has two main advantages over existing ones: (1) Music features are embedded in an automatic and supervised way and the architecture is more promising as the music data keeps expanding; (2) Feature embedding on segment-level is more precise since the query audio can be any identifiable segment of a music work and the audio can have different lengths. Extensive experiments demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://labrosa.ee.columbia.edu/millionsong/secondhand.

  2. 2.

    http://www.spotify.com.

References

  1. Aroma, M., Arijit, R., Bijaya, A., Granstedt, J.: Learning to listen matching song covers to original songs via supervised learning methods (2015). http://courses.cs.vt.edu/cs5824/Fall15/project_reports/mahendru_ray_adhikari_granstedt.pdf

  2. Van Balen, J., Bountouridis, D., Wiering, F., Veltkamp, R.C., et al.: Cognition-inspired descriptors for scalable cover song retrieval. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)

    Google Scholar 

  3. Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using hashed chroma landmarks. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 117–120. IEEE (2011)

    Google Scholar 

  4. Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using the 2D Fourier transform magnitude. In: ISMIR, pp. 241–246 (2012)

    Google Scholar 

  5. Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: ISMIR, vol. 2, p. 10 (2011)

    Google Scholar 

  6. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546. IEEE (2005)

    Google Scholar 

  7. Foster, P., Dixon, S., Klapuri, A.: Identification of cover songs using information theoretic measures of similarity. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 739–743. IEEE (2013)

    Google Scholar 

  8. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)

    Google Scholar 

  9. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7

    Chapter  Google Scholar 

  10. Jensen, J.H., Christensen, M.G., Jensen, S.H.: A chroma-based tempo-insensitive distance measure for cover song identification using the 2D autocorrelation function. In: Fourth Music Information Retrieval Evaluation eXchange (2008)

    Google Scholar 

  11. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)

  12. Khadkevich, M., Omologo, M.: Large-scale cover song identification using chord profiles. In: ISMIR, pp. 233–238 (2013)

    Google Scholar 

  13. Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM (2009)

    Google Scholar 

  14. Osmalsky, J., Van Droogenbroeck, M., Embrechts, J.J.: Enhancing cover song identification with hierarchical rank aggregation. In: Proceedings of the 17th International for Music Information Retrieval Conference (2016)

    Google Scholar 

  15. Serra, J., Gómez, E., Herrera, P., Serra, X.: Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. Audio Speech Lang. Process. 16(6), 1138–1151 (2008)

    Article  Google Scholar 

  16. Serra, X., Andrzejak, R.G., et al.: Cross recurrence quantification for cover song identification. New J. Phys. 11(9), 093017 (2009)

    Article  Google Scholar 

  17. Stamenovic, M.: Identifying cover songs using deep neural networks (2015)

    Google Scholar 

  18. Typke, R., Wiering, F., Veltkamp, R.C., et al.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)

    Google Scholar 

  19. Tzanetakis, G., Ermolinskyi, A., Cook, P.: Pitch histograms in audio and symbolic music information retrieval. J. New Music Res. 32(2), 143–152 (2003)

    Article  Google Scholar 

  20. Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)

    Google Scholar 

  21. Wu, P., Hoi, S.C., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 153–162. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China (No. 61370116).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyu Qi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qi, X., Yang, D., Chen, X. (2018). Triplet Convolutional Network for Music Version Identification. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_44

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73603-7_44

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73602-0

  • Online ISBN: 978-3-319-73603-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics