Triplet Convolutional Network for Music Version Identification

Qi, Xiaoyu; Yang, Deshun; Chen, Xiaoou

doi:10.1007/978-3-319-73603-7_44

Xiaoyu Qi²¹,
Deshun Yang²¹ &
Xiaoou Chen²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10704))

Included in the following conference series:

International Conference on Multimedia Modeling

3208 Accesses
6 Citations

Abstract

Music version identification has long been a difficult task in the music information retrieval field, due to the variations in tempo, key and structure. Most existing methods use hand-crafted features, which require extensive human efforts and expert participants to design the feature structures and further breakthrough is hardly achievable. Therefore, we propose a triplet convolutional embedding network for version identification, learning feature representations for music automatically in a supervised way. Triplet convolutional networks can learn segment-level features from training data, focusing on the most similar parts between music versions, rather than on the song-level. Furthermore, we compare triplet-based learning with pair-based learning. Our approach has two main advantages over existing ones: (1) Music features are embedded in an automatic and supervised way and the architecture is more promising as the music data keeps expanding; (2) Feature embedding on segment-level is more precise since the query audio can be any identifiable segment of a music work and the audio can have different lengths. Extensive experiments demonstrate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aroma, M., Arijit, R., Bijaya, A., Granstedt, J.: Learning to listen matching song covers to original songs via supervised learning methods (2015). http://courses.cs.vt.edu/cs5824/Fall15/project_reports/mahendru_ray_adhikari_granstedt.pdf
Van Balen, J., Bountouridis, D., Wiering, F., Veltkamp, R.C., et al.: Cognition-inspired descriptors for scalable cover song retrieval. In: Proceedings of the 15th International Conference on Music Information Retrieval (2014)
Google Scholar
Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using hashed chroma landmarks. In: 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 117–120. IEEE (2011)
Google Scholar
Bertin-Mahieux, T., Ellis, D.P.: Large-scale cover song recognition using the 2D Fourier transform magnitude. In: ISMIR, pp. 241–246 (2012)
Google Scholar
Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P.: The million song dataset. In: ISMIR, vol. 2, p. 10 (2011)
Google Scholar
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 539–546. IEEE (2005)
Google Scholar
Foster, P., Dixon, S., Klapuri, A.: Identification of cover songs using information theoretic measures of similarity. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 739–743. IEEE (2013)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech And Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
Google Scholar
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: Feragen, A., Pelillo, M., Loog, M. (eds.) SIMBAD 2015. LNCS, vol. 9370, pp. 84–92. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24261-3_7
Chapter Google Scholar
Jensen, J.H., Christensen, M.G., Jensen, S.H.: A chroma-based tempo-insensitive distance measure for cover song identification using the 2D autocorrelation function. In: Fourth Music Information Retrieval Evaluation eXchange (2008)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Khadkevich, M., Omologo, M.: Large-scale cover song identification using chord profiles. In: ISMIR, pp. 233–238 (2013)
Google Scholar
Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM (2009)
Google Scholar
Osmalsky, J., Van Droogenbroeck, M., Embrechts, J.J.: Enhancing cover song identification with hierarchical rank aggregation. In: Proceedings of the 17th International for Music Information Retrieval Conference (2016)
Google Scholar
Serra, J., Gómez, E., Herrera, P., Serra, X.: Chroma binary similarity and local alignment applied to cover song identification. IEEE Trans. Audio Speech Lang. Process. 16(6), 1138–1151 (2008)
Article Google Scholar
Serra, X., Andrzejak, R.G., et al.: Cross recurrence quantification for cover song identification. New J. Phys. 11(9), 093017 (2009)
Article Google Scholar
Stamenovic, M.: Identifying cover songs using deep neural networks (2015)
Google Scholar
Typke, R., Wiering, F., Veltkamp, R.C., et al.: A survey of music information retrieval systems. In: ISMIR, pp. 153–160 (2005)
Google Scholar
Tzanetakis, G., Ermolinskyi, A., Cook, P.: Pitch histograms in audio and symbolic music information retrieval. J. New Music Res. 32(2), 143–152 (2003)
Article Google Scholar
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 157–166. ACM (2014)
Google Scholar
Wu, P., Hoi, S.C., Xia, H., Zhao, P., Wang, D., Miao, C.: Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, pp. 153–162. ACM (2013)
Google Scholar

Download references

Acknowledgments

This work was supported by the Natural Science Foundation of China (No. 61370116).

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, People’s Republic of China
Xiaoyu Qi, Deshun Yang & Xiaoou Chen

Authors

Xiaoyu Qi
View author publications
You can also search for this author in PubMed Google Scholar
Deshun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoou Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyu Qi .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, X., Yang, D., Chen, X. (2018). Triplet Convolutional Network for Music Version Identification. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10704. Springer, Cham. https://doi.org/10.1007/978-3-319-73603-7_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-73603-7_44
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73602-0
Online ISBN: 978-3-319-73603-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics