Abstract
Music version identification has long been a difficult task in the music information retrieval field, due to the variations in tempo, key and structure. Most existing methods use hand-crafted features, which require extensive human efforts and expert participants to design the feature structures and further breakthrough is hardly achievable. Therefore, we propose a triplet convolutional embedding network for version identification, learning feature representations for music automatically in a supervised way. Triplet convolutional networks can learn segment-level features from training data, focusing on the most similar parts between music versions, rather than on the song-level. Furthermore, we compare triplet-based learning with pair-based learning. Our approach has two main advantages over existing ones: (1) Music features are embedded in an automatic and supervised way and the architecture is more promising as the music data keeps expanding; (2) Feature embedding on segment-level is more precise since the query audio can be any identifiable segment of a music work and the audio can have different lengths. Extensive experiments demonstrate the effectiveness of our method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aroma, M., Arijit, R., Bijaya, A., Granstedt, J.: Learning to listen matching song covers to original songs via supervised learning methods (2015). http://courses.cs.vt.edu/cs5824/Fall15/project_reports/mahendru_ray_adhikari_granstedt.pdf
Van Balen, J., Bountouridis, D., Wiering, F., Veltkamp, R.C., et al.: Cognition-inspired descriptors for scalable cover song retrieval. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)
Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using hashed chroma landmarks. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 117–120. IEEE (2011)
Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using the 2D Fourier transform magnitude. In: ISMIR, pp. 241–246 (2012)
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: ISMIR, vol. 2, p. 10 (2011)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546. IEEE (2005)
Foster, P., Dixon, S., Klapuri, A.: Identification of cover songs using information theoretic measures of similarity. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 739–743. IEEE (2013)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Jensen, J.H., Christensen, M.G., Jensen, S.H.: A chroma-based tempo-insensitive distance measure for cover song identification using the 2D autocorrelation function. In: Fourth Music Information Retrieval Evaluation eXchange (2008)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Khadkevich, M., Omologo, M.: Large-scale cover song identification using chord profiles. In: ISMIR, pp. 233–238 (2013)
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM (2009)
Osmalsky, J., Van Droogenbroeck, M., Embrechts, J.J.: Enhancing cover song identification with hierarchical rank aggregation. In: Proceedings of the 17th International for Music Information Retrieval Conference (2016)
Serra, J., Gómez, E., Herrera, P., Serra, X.: Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. Audio Speech Lang. Process. 16(6), 1138–1151 (2008)
Serra, X., Andrzejak, R.G., et al.: Cross recurrence quantification for cover song identification. New J. Phys. 11(9), 093017 (2009)
Stamenovic, M.: Identifying cover songs using deep neural networks (2015)
Typke, R., Wiering, F., Veltkamp, R.C., et al.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)
Tzanetakis, G., Ermolinskyi, A., Cook, P.: Pitch histograms in audio and symbolic music information retrieval. J. New Music Res. 32(2), 143–152 (2003)
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)
Wu, P., Hoi, S.C., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 153–162. ACM (2013)
Acknowledgments
This work was supported by the Natural Science Foundation of China (No. 61370116).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Qi, X., Yang, D., Chen, X. (2018). Triplet Convolutional Network for Music Version Identification. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-73603-7_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)