Abstract
In recent years, with the development of video websites such as YouTube, TikTok, and Bilibili, a great number of auto-tune remix audios are produced every day. Auto-tune remix audios are usually made from existing famous audios. The original clips can be tuned to various remixes through professional editing techniques. In the creation process, the characteristics of the singer are usually maintained, thus the original materials can be traced by singer recognition methods. This paper mainly focuses on the research of auto-tune remix singer recognition. As this topic has not been discussed before, we create a dataset of auto-tune remix audios and attempt to recognize the identity of the singer. Firstly, we use an x-vector model trained on the TIMIT dataset, and then evaluate it on the ATRemix dataset. Secondly, ATRemix dataset used to train different models, and SubATRemix dataset used as a test set, which shows good performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nagraniy, A., Chungy, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: Proceedings INTERSPEECH, pp. 2616–2620 (2017)
Srivastava, S., Gopal, G., Bhardwaj, S.: Multi-scenario dataset for speaker recognition. J. Intell. Fuzzy Syst. 34(3), 1385–1392 (2018)
Fan, Y., Kang, J.W., Li, L.T., et al.: CN-CELEB: a challenging Chinese speaker recognition dataset. In: Proceedings ICASSP, pp. 7604–7608 (2020)
Kalluri, S.B., Vijayasenan, D., Ganapathy, S., et al.: NISP: a multi-lingual multi-accent dataset for speaker profiling. In: Proceedings ICASSP, pp. 6953–6957 (2021)
Roth, J., Chaudhuri, S., Klejch, O., et al.: Ava active speaker: an audio-visual dataset for active speaker detection. In: Proceedings ICASSP, pp. 4492–4496 (2020)
Bost, X., Labatut, V., Linares, G.: Serial speakers: a dataset of TV series. In: Proceedongs LREC (2020)
Ismail, M., Memon, S., Dhomeja, L.D., et al.: Development of a regional voice dataset and speaker classification based on machine learning. J. Big Data 8(1), 1–18 (2021)
Yesiler, F., Tralie, C., Correya, A., et al.: Da-TACOS: a dataset for cover song identification and understanding. In: Proceedings ISMIR, pp. 327–334 (2019)
Bayle, Y., Marsik, L., Rusek, M., et al.: Kara1k: a karaoke dataset for cover song identification and singing voice analysis. In: Proceedings ICASSP, pp. 177–184 (2017)
Yuan, X., Li, G., Han, J., et al.: Speaker identification based on Ivector and Xvector. J. Phys. Conf. Ser. 1827 (2021)
Zhang, Y., Lv, Z., Wu, H., et al.: MFA-conformer: multi-scale feature aggregation conformer for automatic speaker verification. arXiv:2203.15249 (2022)
Inoue, A., Fukumoto, M.: A proposal of creating ideal UTAU voice based on voice of the user’s own key by interactive differential evolution. In: Proceedings CSII, pp. 56–59 (2019)
Gulati, A., Qin, J., Chiu, C.-C., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Proceedings INTERSPEECH, pp. 5036–5040 (2020)
Acknowledgements
This work was supported by the National Natural Science Foundation of China under Grant No. U1836219 and No. 62276153, and in part by a grant from the Guoqiang Institute, Tsinghua University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, L. et al. (2022). ATRemix: An Auto-tune Remix Dataset for Singer Recognition. In: Deng, W., et al. Biometric Recognition. CCBR 2022. Lecture Notes in Computer Science, vol 13628. Springer, Cham. https://doi.org/10.1007/978-3-031-20233-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-20233-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20232-2
Online ISBN: 978-3-031-20233-9
eBook Packages: Computer ScienceComputer Science (R0)