research-article

Fully Fused Cover Song Identification Model via Feature Fusing and Clustering

Authors:
Qiang Yuan

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

0000-0001-9755-7216
View Profile

,
Shibiao Xu

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

0000-0003-4037-9900
View Profile

,
Li Guo

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876,China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China; National Engineering Research Center for Mobile Internet Security Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China., China

0000-0002-9723-3294
View Profile

ICCIP '22: Proceedings of the 8th International Conference on Communication and Information ProcessingNovember 2022Pages 61–66https://doi.org/10.1145/3571662.3571672

Published:03 January 2023Publication History

ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing

Pages 61–66

ABSTRACT

In recent years, Cover Song Identification (CSI) based on Siamese Network and music representation learning has achieved good performance, however, there are still many problems such as limited feature fusion, missing decision threshold and single data label. In this paper, we propose a novel fully fused cover song identification model via feature fusing and clustering. In our proposed model, there are a fusion feature extraction structure, a channel separation decision structure, and a music feature clustering structure. First, we combine the pre-processing features of the dual input along the channel dimension to achieve full feature fusion and increase the fusion degree of the two songs in the feature extraction process. Secondly, we introduce channel separation to calculate multi-channel cross-features to improve the ability of the model to learn the difference between feature channels, and combined with the binary decision network to avoid the shortcomings of lack of decision thresholds in music representation learning. Finally, feature clustering generates invisible feature labels to enriches the types of cover data labels and reduces the difficulty of training. The model is trained in stages to optimize the clustering loss and the classification loss for cover and non-cover pairs, respectively. The model is validated on three public datasets, and experiments show that our model could achieve competitive results.

References

Juan Pablo Bello. 2007. Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats.. In ISMIR, Vol. 7. 239–244.Google Scholar
Chengdi Cao and Wei-Qiang Zhang. 2020. MulKINet: Multi-Stage Key-Invariant Convolutional Neural Networks for Accurate and Fast Cover Song Identification. In 2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 1–6.Google Scholar
Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, and Douglass Turnbull. 2020. Content-based music similarity with triplet networks. arXiv preprint arXiv:2008.04938(2020).Google Scholar
Albin Andrew Correya, Romain Hennequin, and Mickaël Arcos. 2018. Large-scale cover song detection in digital music libraries using metadata, lyrics and audio features. arXiv preprint arXiv:1808.10351(2018).Google Scholar
Guillaume Doras and Geoffroy Peeters. 2019. Cover detection using dominant melody embeddings. arXiv preprint arXiv:1907.01824(2019).Google Scholar
Guillaume Doras and Geoffroy Peeters. 2020. A prototypical triplet loss for cover detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3797–3801.Google ScholarCross Ref
Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, and Zejun Ma. 2021. Bytecover: Cover song identification via multi-loss training. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 551–555.Google ScholarCross Ref
Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs’ with chroma features and dynamic programming beat tracking. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4. IEEE, IV–1429.Google ScholarCross Ref
Yanlan Fan and Ning Chen. 2019. Music similarity model based on CRP fusion and Multi-Kernel Integration. Multimedia Tools and Applications 78, 12 (2019), 16245–16260.Google ScholarDigital Library
Arthur Flexer and Taric Lallai. 2019. Can We Increase Inter-and Intra-Rater Agreement in Modeling General Music Similarity?.. In ISMIR. 494–500.Google Scholar
Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng Deng, Weidong Cai, and Heng Huang. 2017. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE international conference on computer vision. 5736–5745.Google Scholar
Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Learn a robust representation for cover song identification via aggregating local and global music temporal context. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarCross Ref
Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Similarity learning for cover song identification using cross-similarity matrices of multi-level deep sequences. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 26–30.Google ScholarCross Ref
Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. 2020. Disentangled multidimensional metric learning for music similarity. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6–10.Google ScholarCross Ref
Juheon Lee, Sungkyun Chang, Sang Keun Choe, and Kyogu Lee. 2018. Cover song identification using song-to-song cross-similarity matrix with convolutional neural network. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 396–400.Google ScholarDigital Library
Qianli Ma, Jiawei Zheng, Sen Li, and Gary W Cottrell. 2019. Learning representations for time series clustering. Advances in neural information processing systems 32 (2019).Google Scholar
Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. 2021. CDPAM: Contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 196–200.Google ScholarCross Ref
Manan Mehta, Anmol Sajnani, and Radhika Chapaneri. 2019. Cover song identification with pairwise cross-similarity matrix using deep learning. In 2019 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 1–5.Google ScholarCross Ref
Xiaoyu Qi, Deshun Yang, and Xiaoou Chen. 2018. Triplet convolutional network for music version identification. In International Conference on Multimedia Modeling. Springer, 544–555.Google ScholarCross Ref
Joan Serra and Emilia Gómez. 2008. Audio cover song identification based on tonal sequence alignment. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 61–64.Google ScholarCross Ref
Joan Serra, Emilia Gómez, Perfecto Herrera, and Xavier Serra. 2008. Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech, and Language Processing 16, 6(2008), 1138–1151.Google ScholarDigital Library
Marko Stamenovic. 2020. Towards cover song detection with siamese convolutional neural networks. arXiv preprint arXiv:2005.10294(2020).Google Scholar
Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2018. Key-invariant convolutional neural network toward efficient cover song identification. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarCross Ref
Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Accurate and scalable version identification using musically-motivated embeddings. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 21–25.Google ScholarCross Ref
Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Less is more: Faster and better music version identification with embedding distillation. arXiv preprint arXiv:2010.03284(2020).Google Scholar
Furkan Yesiler, Chris Tralie, Albin Andrew Correya, Diego F Silva, Philip Tovstogan, Emilia Gómez Gutiérrez, and Xavier Serra. 2019. Da-TACOS: A dataset for cover song identification and understanding. In Proceedings of the 20th Conference of the International Society for Music Information Retrieval (ISMIR 2019): 2019 Nov 4-8; Delft, The Netherlands.[Canada]: ISMIR; 2019.International Society for Music Information Retrieval (ISMIR).Google Scholar
Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2019. Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification.. In IJCAI. 4846–4852.Google Scholar

Index Terms

Fully Fused Cover Song Identification Model via Feature Fusing and Clustering
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query representation

Recommendations

Song popularity prediction model based on multi-modal feature fusion and LightGBM
ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing

Since the task of hit song prediction was proposed, many experts and technicians have done a lot of research and achieved good results, but there are still some problems such as limited song feature types, lack of feature importance, and insufficient ...
Read More
Fusing similarity functions for cover song identification

Cover Song Identification (CSI) technique, refers to the process of identifying an alternative version, performance, rendition, or recording of a previously recorded musical composition by measuring and modeling the musical similarity between them ...
Read More
Two-layer similarity fusion model for cover song identification

Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music, and often differ in scale and noise level. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing
November 2022
219 pages
ISBN:9781450397100
DOI:10.1145/3571662

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 January 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cover song identification
feature clustering
feature fusing
mutli-channel cross-features
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
ICCIP '22 Paper Acceptance Rate61of301submissions,20%Overall Acceptance Rate61of301submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 40
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Fully Fused Cover Song Identification Model via Feature Fusing and Clustering

ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Song popularity prediction model based on multi-modal feature fusion and LightGBM

Fusing similarity functions for cover song identification

Two-layer similarity fusion model for cover song identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Fully Fused Cover Song Identification Model via Feature Fusing and Clustering

ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Song popularity prediction model based on multi-modal feature fusion and LightGBM

Fusing similarity functions for cover song identification

Two-layer similarity fusion model for cover song identification

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media