skip to main content
10.1145/3571662.3571672acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccipConference Proceedingsconference-collections
research-article

Fully Fused Cover Song Identification Model via Feature Fusing and Clustering

Authors Info & Claims
Published:03 January 2023Publication History

ABSTRACT

In recent years, Cover Song Identification (CSI) based on Siamese Network and music representation learning has achieved good performance, however, there are still many problems such as limited feature fusion, missing decision threshold and single data label. In this paper, we propose a novel fully fused cover song identification model via feature fusing and clustering. In our proposed model, there are a fusion feature extraction structure, a channel separation decision structure, and a music feature clustering structure. First, we combine the pre-processing features of the dual input along the channel dimension to achieve full feature fusion and increase the fusion degree of the two songs in the feature extraction process. Secondly, we introduce channel separation to calculate multi-channel cross-features to improve the ability of the model to learn the difference between feature channels, and combined with the binary decision network to avoid the shortcomings of lack of decision thresholds in music representation learning. Finally, feature clustering generates invisible feature labels to enriches the types of cover data labels and reduces the difficulty of training. The model is trained in stages to optimize the clustering loss and the classification loss for cover and non-cover pairs, respectively. The model is validated on three public datasets, and experiments show that our model could achieve competitive results.

References

  1. Juan Pablo Bello. 2007. Audio-Based Cover Song Retrieval Using Approximate Chord Sequences: Testing Shifts, Gaps, Swaps and Beats.. In ISMIR, Vol. 7. 239–244.Google ScholarGoogle Scholar
  2. Chengdi Cao and Wei-Qiang Zhang. 2020. MulKINet: Multi-Stage Key-Invariant Convolutional Neural Networks for Accurate and Fast Cover Song Identification. In 2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 1–6.Google ScholarGoogle Scholar
  3. Joseph Cleveland, Derek Cheng, Michael Zhou, Thorsten Joachims, and Douglass Turnbull. 2020. Content-based music similarity with triplet networks. arXiv preprint arXiv:2008.04938(2020).Google ScholarGoogle Scholar
  4. Albin Andrew Correya, Romain Hennequin, and Mickaël Arcos. 2018. Large-scale cover song detection in digital music libraries using metadata, lyrics and audio features. arXiv preprint arXiv:1808.10351(2018).Google ScholarGoogle Scholar
  5. Guillaume Doras and Geoffroy Peeters. 2019. Cover detection using dominant melody embeddings. arXiv preprint arXiv:1907.01824(2019).Google ScholarGoogle Scholar
  6. Guillaume Doras and Geoffroy Peeters. 2020. A prototypical triplet loss for cover detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3797–3801.Google ScholarGoogle ScholarCross RefCross Ref
  7. Xingjian Du, Zhesong Yu, Bilei Zhu, Xiaoou Chen, and Zejun Ma. 2021. Bytecover: Cover song identification via multi-loss training. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 551–555.Google ScholarGoogle ScholarCross RefCross Ref
  8. Daniel PW Ellis and Graham E Poliner. 2007. Identifyingcover songs’ with chroma features and dynamic programming beat tracking. In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, Vol. 4. IEEE, IV–1429.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yanlan Fan and Ning Chen. 2019. Music similarity model based on CRP fusion and Multi-Kernel Integration. Multimedia Tools and Applications 78, 12 (2019), 16245–16260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Arthur Flexer and Taric Lallai. 2019. Can We Increase Inter-and Intra-Rater Agreement in Modeling General Music Similarity?.. In ISMIR. 494–500.Google ScholarGoogle Scholar
  11. Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng Deng, Weidong Cai, and Heng Huang. 2017. Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In Proceedings of the IEEE international conference on computer vision. 5736–5745.Google ScholarGoogle Scholar
  12. Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Learn a robust representation for cover song identification via aggregating local and global music temporal context. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  13. Chaoya Jiang, Deshun Yang, and Xiaoou Chen. 2020. Similarity learning for cover song identification using cross-similarity matrices of multi-level deep sequences. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 26–30.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jongpil Lee, Nicholas J Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam. 2020. Disentangled multidimensional metric learning for music similarity. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6–10.Google ScholarGoogle ScholarCross RefCross Ref
  15. Juheon Lee, Sungkyun Chang, Sang Keun Choe, and Kyogu Lee. 2018. Cover song identification using song-to-song cross-similarity matrix with convolutional neural network. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 396–400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Qianli Ma, Jiawei Zheng, Sen Li, and Gary W Cottrell. 2019. Learning representations for time series clustering. Advances in neural information processing systems 32 (2019).Google ScholarGoogle Scholar
  17. Pranay Manocha, Zeyu Jin, Richard Zhang, and Adam Finkelstein. 2021. CDPAM: Contrastive learning for perceptual audio similarity. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 196–200.Google ScholarGoogle ScholarCross RefCross Ref
  18. Manan Mehta, Anmol Sajnani, and Radhika Chapaneri. 2019. Cover song identification with pairwise cross-similarity matrix using deep learning. In 2019 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  19. Xiaoyu Qi, Deshun Yang, and Xiaoou Chen. 2018. Triplet convolutional network for music version identification. In International Conference on Multimedia Modeling. Springer, 544–555.Google ScholarGoogle ScholarCross RefCross Ref
  20. Joan Serra and Emilia Gómez. 2008. Audio cover song identification based on tonal sequence alignment. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 61–64.Google ScholarGoogle ScholarCross RefCross Ref
  21. Joan Serra, Emilia Gómez, Perfecto Herrera, and Xavier Serra. 2008. Chroma binary similarity and local alignment applied to cover song identification. IEEE Transactions on Audio, Speech, and Language Processing 16, 6(2008), 1138–1151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Marko Stamenovic. 2020. Towards cover song detection with siamese convolutional neural networks. arXiv preprint arXiv:2005.10294(2020).Google ScholarGoogle Scholar
  23. Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2018. Key-invariant convolutional neural network toward efficient cover song identification. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  24. Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Accurate and scalable version identification using musically-motivated embeddings. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 21–25.Google ScholarGoogle ScholarCross RefCross Ref
  25. Furkan Yesiler, Joan Serrà, and Emilia Gómez. 2020. Less is more: Faster and better music version identification with embedding distillation. arXiv preprint arXiv:2010.03284(2020).Google ScholarGoogle Scholar
  26. Furkan Yesiler, Chris Tralie, Albin Andrew Correya, Diego F Silva, Philip Tovstogan, Emilia Gómez Gutiérrez, and Xavier Serra. 2019. Da-TACOS: A dataset for cover song identification and understanding. In Proceedings of the 20th Conference of the International Society for Music Information Retrieval (ISMIR 2019): 2019 Nov 4-8; Delft, The Netherlands.[Canada]: ISMIR; 2019.International Society for Music Information Retrieval (ISMIR).Google ScholarGoogle Scholar
  27. Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, and Deshun Yang. 2019. Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification.. In IJCAI. 4846–4852.Google ScholarGoogle Scholar

Index Terms

  1. Fully Fused Cover Song Identification Model via Feature Fusing and Clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCIP '22: Proceedings of the 8th International Conference on Communication and Information Processing
      November 2022
      219 pages
      ISBN:9781450397100
      DOI:10.1145/3571662

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      ICCIP '22 Paper Acceptance Rate61of301submissions,20%Overall Acceptance Rate61of301submissions,20%
    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format