ABSTRACT
In recent years, Cover Song Identification (CSI) based on Siamese Network and music representation learning has achieved good performance, however, there are still many problems such as limited feature fusion, missing decision threshold and single data label. In this paper, we propose a novel fully fused cover song identification model via feature fusing and clustering. In our proposed model, there are a fusion feature extraction structure, a channel separation decision structure, and a music feature clustering structure. First, we combine the pre-processing features of the dual input along the channel dimension to achieve full feature fusion and increase the fusion degree of the two songs in the feature extraction process. Secondly, we introduce channel separation to calculate multi-channel cross-features to improve the ability of the model to learn the difference between feature channels, and combined with the binary decision network to avoid the shortcomings of lack of decision thresholds in music representation learning. Finally, feature clustering generates invisible feature labels to enriches the types of cover data labels and reduces the difficulty of training. The model is trained in stages to optimize the clustering loss and the classification loss for cover and non-cover pairs, respectively. The model is validated on three public datasets, and experiments show that our model could achieve competitive results.
- Juan Pablo Bello. 2007. Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats.. In ISMIR, Vol. 7. 239–244.Google Scholar
- Chengdi Cao and Wei-Qiang Zhang. 2020. MulKINet: Multi-Stage Key-Invariant Convolutional Neural Networks for Accurate and Fast Cover Song Identification. In 2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 1–6.Google Scholar
- Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, and Douglass Turnbull. 2020. Content-based music similarity with triplet networks. arXiv preprint arXiv:2008.04938(2020).Google Scholar
- Albin Andrew Correya, Romain Hennequin, and Mickaël Arcos. 2018. Large-scale cover song detection in digital music libraries using metadata, lyrics and audio features. arXiv preprint arXiv:1808.10351(2018).Google Scholar
- Guillaume Doras and Geoffroy Peeters. 2019. Cover detection using dominant melody embeddings. arXiv preprint arXiv:1907.01824(2019).Google Scholar
- Guillaume Doras and Geoffroy Peeters. 2020. A prototypical triplet loss for cover detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3797–3801.Google ScholarCross Ref
- Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, and Zejun Ma. 2021. Bytecover: Cover song identification via multi-loss training. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 551–555.Google ScholarCross Ref
- Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs’ with chroma features and dynamic programming beat tracking. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4. IEEE, IV–1429.Google ScholarCross Ref
- Yanlan Fan and Ning Chen. 2019. Music similarity model based on CRP fusion and Multi-Kernel Integration. Multimedia Tools and Applications 78, 12 (2019), 16245–16260.Google ScholarDigital Library
- Arthur Flexer and Taric Lallai. 2019. Can We Increase Inter-and Intra-Rater Agreement in Modeling General Music Similarity?.. In ISMIR. 494–500.Google Scholar
- Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng Deng, Weidong Cai, and Heng Huang. 2017. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE international conference on computer vision. 5736–5745.Google Scholar
- Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Learn a robust representation for cover song identification via aggregating local and global music temporal context. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarCross Ref
- Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Similarity learning for cover song identification using cross-similarity matrices of multi-level deep sequences. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 26–30.Google ScholarCross Ref
- Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. 2020. Disentangled multidimensional metric learning for music similarity. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6–10.Google ScholarCross Ref
- Juheon Lee, Sungkyun Chang, Sang Keun Choe, and Kyogu Lee. 2018. Cover song identification using song-to-song cross-similarity matrix with convolutional neural network. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 396–400.Google ScholarDigital Library
- Qianli Ma, Jiawei Zheng, Sen Li, and Gary W Cottrell. 2019. Learning representations for time series clustering. Advances in neural information processing systems 32 (2019).Google Scholar
- Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. 2021. CDPAM: Contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 196–200.Google ScholarCross Ref
- Manan Mehta, Anmol Sajnani, and Radhika Chapaneri. 2019. Cover song identification with pairwise cross-similarity matrix using deep learning. In 2019 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 1–5.Google ScholarCross Ref
- Xiaoyu Qi, Deshun Yang, and Xiaoou Chen. 2018. Triplet convolutional network for music version identification. In International Conference on Multimedia Modeling. Springer, 544–555.Google ScholarCross Ref
- Joan Serra and Emilia Gómez. 2008. Audio cover song identification based on tonal sequence alignment. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 61–64.Google ScholarCross Ref
- Joan Serra, Emilia Gómez, Perfecto Herrera, and Xavier Serra. 2008. Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech, and Language Processing 16, 6(2008), 1138–1151.Google ScholarDigital Library
- Marko Stamenovic. 2020. Towards cover song detection with siamese convolutional neural networks. arXiv preprint arXiv:2005.10294(2020).Google Scholar
- Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2018. Key-invariant convolutional neural network toward efficient cover song identification. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarCross Ref
- Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Accurate and scalable version identification using musically-motivated embeddings. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 21–25.Google ScholarCross Ref
- Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Less is more: Faster and better music version identification with embedding distillation. arXiv preprint arXiv:2010.03284(2020).Google Scholar
- Furkan Yesiler, Chris Tralie, Albin Andrew Correya, Diego F Silva, Philip Tovstogan, Emilia Gómez Gutiérrez, and Xavier Serra. 2019. Da-TACOS: A dataset for cover song identification and understanding. In Proceedings of the 20th Conference of the International Society for Music Information Retrieval (ISMIR 2019): 2019 Nov 4-8; Delft, The Netherlands.[Canada]: ISMIR; 2019.International Society for Music Information Retrieval (ISMIR).Google Scholar
- Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2019. Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification.. In IJCAI. 4846–4852.Google Scholar
Index Terms
- Fully Fused Cover Song Identification Model via Feature Fusing and Clustering
Recommendations
Song popularity prediction model based on multi-modal feature fusion and LightGBM
ICCIP '22: Proceedings of the 8th International Conference on Communication and Information ProcessingSince the task of hit song prediction was proposed, many experts and technicians have done a lot of research and achieved good results, but there are still some problems such as limited song feature types, lack of feature importance, and insufficient ...
Fusing similarity functions for cover song identification
Cover Song Identification (CSI) technique, refers to the process of identifying an alternative version, performance, rendition, or recording of a previously recorded musical composition by measuring and modeling the musical similarity between them ...
Two-layer similarity fusion model for cover song identification
Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music, and often differ in scale and noise level. ...
Comments