ATRemix: An Auto-tune Remix Dataset for Singer Recognition

Wang, Lifang; Wang, Bingyuan; Tan, Guanghao; Zhang, Wei-Qiang; Feng, Jun; Zhu, Bing; Wang, Shenjin

doi:10.1007/978-3-031-20233-9_35

Lifang Wang¹⁵,
Bingyuan Wang¹⁶,
Guanghao Tan¹⁷,
Wei-Qiang Zhang¹⁸,
Jun Feng¹⁵,
Bing Zhu¹⁵ &
…
Shenjin Wang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13628))

Included in the following conference series:

Chinese Conference on Biometric Recognition

1077 Accesses

Abstract

In recent years, with the development of video websites such as YouTube, TikTok, and Bilibili, a great number of auto-tune remix audios are produced every day. Auto-tune remix audios are usually made from existing famous audios. The original clips can be tuned to various remixes through professional editing techniques. In the creation process, the characteristics of the singer are usually maintained, thus the original materials can be traced by singer recognition methods. This paper mainly focuses on the research of auto-tune remix singer recognition. As this topic has not been discussed before, we create a dataset of auto-tune remix audios and attempt to recognize the identity of the singer. Firstly, we use an x-vector model trained on the TIMIT dataset, and then evaluate it on the ATRemix dataset. Secondly, ATRemix dataset used to train different models, and SubATRemix dataset used as a test set, which shows good performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nagraniy, A., Chungy, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings INTERSPEECH, pp. 2616–2620 (2017)
Google Scholar
Srivastava, S., Gopal, G., Bhardwaj, S.: Multi-scenario dataset for speaker recognition. J. Intell. Fuzzy Syst. 34(3), 1385–1392 (2018)
Article Google Scholar
Fan, Y., Kang, J.W., Li, L.T., et al.: CN-CELEB: a challenging Chinese speaker recognition dataset. In: Proceedings ICASSP, pp. 7604–7608 (2020)
Google Scholar
Kalluri, S.B., Vijayasenan, D., Ganapathy, S., et al.: NISP: a multi-lingual multi-accent dataset for speaker profiling. In: Proceedings ICASSP, pp. 6953–6957 (2021)
Google Scholar
Roth, J., Chaudhuri, S., Klejch, O., et al.: Ava active speaker: an audio-visual dataset for active speaker detection. In: Proceedings ICASSP, pp. 4492–4496 (2020)
Google Scholar
Bost, X., Labatut, V., Linares, G.: Serial speakers: a dataset of TV series. In: Proceedongs LREC (2020)
Google Scholar
Ismail, M., Memon, S., Dhomeja, L.D., et al.: Development of a regional voice dataset and speaker classification based on machine learning. J. Big Data 8(1), 1–18 (2021)
Article Google Scholar
Yesiler, F., Tralie, C., Correya, A., et al.: Da-TACOS: a dataset for cover song identification and understanding. In: Proceedings ISMIR, pp. 327–334 (2019)
Google Scholar
Bayle, Y., Marsik, L., Rusek, M., et al.: Kara1k: a karaoke dataset for cover song identification and singing voice analysis. In: Proceedings ICASSP, pp. 177–184 (2017)
Google Scholar
Yuan, X., Li, G., Han, J., et al.: Speaker identification based on Ivector and Xvector. J. Phys. Conf. Ser. 1827 (2021)
Google Scholar
Zhang, Y., Lv, Z., Wu, H., et al.: MFA-conformer: multi-scale feature aggregation conformer for automatic speaker verification. arXiv:2203.15249 (2022)
Inoue, A., Fukumoto, M.: A proposal of creating ideal UTAU voice based on voice of the user’s own key by interactive differential evolution. In: Proceedings CSII, pp. 56–59 (2019)
Google Scholar
Gulati, A., Qin, J., Chiu, C.-C., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Proceedings INTERSPEECH, pp. 5036–5040 (2020)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. U1836219 and No. 62276153, and in part by a grant from the Guoqiang Institute, Tsinghua University.

Author information

Authors and Affiliations

Key Laboratory of Media Audio and Video, Communication University of China, Beijing, 100024, China
Lifang Wang, Jun Feng & Bing Zhu
School of Electronic Engineering and Computer Science, Peking University, Beijing, 100871, China
Bingyuan Wang
School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081, China
Guanghao Tan
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Wei-Qiang Zhang & Shenjin Wang

Authors

Lifang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bingyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guanghao Tan
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Qiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Shenjin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Qiang Zhang .

Editor information

Editors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Weihong Deng
Tsinghua University, Beijing, China
Jianjiang Feng
Beihang University, Beijing, China
Di Huang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Meina Kan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Tsinghua University, Beijing, China
Fang Zheng
China Electronics Standardization Institute, Beijing, China
Wenfeng Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaofeng He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L. et al. (2022). ATRemix: An Auto-tune Remix Dataset for Singer Recognition. In: Deng, W., et al. Biometric Recognition. CCBR 2022. Lecture Notes in Computer Science, vol 13628. Springer, Cham. https://doi.org/10.1007/978-3-031-20233-9_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-20233-9_35
Published: 03 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20232-2
Online ISBN: 978-3-031-20233-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ATRemix: An Auto-tune Remix Dataset for Singer Recognition