On evaluation trials in speaker verification

Li, Lantian; Wang, Di; Abel, Andrew; Wang, Dong

doi:10.1007/s10489-023-05071-9

On evaluation trials in speaker verification

Published: 05 December 2023

Volume 54, pages 113–130, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Lantian Li¹,
Di Wang²,
Andrew Abel³ &
…
Dong Wang ORCID: orcid.org/0000-0002-1286-0644⁴

149 Accesses
1 Citation
Explore all metrics

Abstract

Evaluation trials are crucial to measure performance of speaker verification systems. However, the design of trials that can faithfully reflect system performance and accurately distinguish between different systems remains an open issue. In this paper, we focus on a particular problem: the impact of trials that are easy to solve for the majority of systems. We show that these ‘easy trials’ not only report over-optimistic absolute performance, but also lead to biased relative performance in system comparisons when they are asymmetrically distributed. This motivated the idea of mining ‘hard trials’, i.e., trials that are regarded to be difficult by current representative techniques. Three approaches to retrieving hard trials will be reported, and the properties of the retrieved hard trials are studied, from the perspectives of both machines and humans. Finally, a novel visualization tool which we name a Config-Performance (C-P) map is proposed. In this map, the value at each location represents the performance with a particular proportion of easy and hard trials, thus offering a global view of the system in various test conditions. The identified hard trials and the code of the C-P map tool have been released at http://lilt.cslt.org/trials/demo/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

Article 23 June 2023

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Sample-specific late classifier fusion for speaker verification

Article 25 August 2017

Data availability

All the data used in this paper are public data.

Code Availability

The code has been published at https://gitlab.com/csltstu/sunine.

Notes

References

Bai Z, Zhang XL (2021) Speaker recognition based on deep learning: an overview. Neural Netw 140:65–99
Article Google Scholar
Brown A, Huh J, Chung JS et al (2022) VoxSRC 2021: the third VoxCeleb speaker recognition challenge. arXiv:2201.04583
Casanova E, Weber J, Shulby CD et al (2022) Yourtts: towards zero-shot multi-speaker tts and zero-shot voice conversion for everyone. In: International conference on machine learning, PMLR, pp 2709–2720
Chen Z, Liu B, Han B et al (2022) The SJTU X-LANCE Lab system for CNSRC 2022. arXiv:2206.11699
Chung JS, Nagrani A, Zisserman A (2018) VoxCeleb2: deep speaker recognition. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 1086–1090
Chung JS, Nagrani A, Coto E, et al (2019) VoxSRC 2019: The first VoxCeleb speaker recognition challenge. arXiv:1912.02522
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article Google Scholar
Dehak N, Kenny PJ, Dehak R et al (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798
Article Google Scholar
Deng J, Guo J, Xue N et al (2019) Arcface: additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4690–4699
Desplanques B, Thienpondt J, Demuynck K (2020) ECAPA-TDNN: emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 3830–3834
Fan Y, Kang J, Li L et al (2020) CN-Celeb: a challenging Chinese speaker recognition dataset. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7604–7608
Haeb-Umbach R, Watanabe S, Nakatani T et al (2019) Speech processing for digital home assistants: combining signal processing with deep-learning techniques. IEEE Signal Proc Mag 36(6):111–124
Article Google Scholar
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Proc Mag 32(6):74–99
Article Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Ioffe S (2006) Probabilistic linear discriminant analysis. In: European conference on computer vision (ECCV). Springer, pp 531–542
Jiang S, Chen J, Liu Q et al (2022) The STAP system for CN-Celeb speaker recognition challange 2022. https://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/T106.pdf
Kabir MM, Mridha MF, Shin J et al (2021) A survey of speaker recognition: fundamental theories, recognition methods and opportunities. IEEE Access 9:79,236–79,263
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93
Article Google Scholar
Kinnunen T, Nautsch A, Sahidullah M et al (2021) Visualizing classifier adjacency relations: a case study in speaker verification and voice anti-spoofing. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 4299–4303
Li L, Chen Y, Shi Y et al (2017) Deep speaker feature learning for text-independent speaker verification. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 1542–1546
Li L, Jiang T, Hong Q et al (2022a) CNSRC 2022 technical report. http://cnceleb.org/workshop
Li L, Liu R, Kang J et al (2022) CN-Celeb: multi-genre speaker recognition. Speech Comm 137:77–91
Article Google Scholar
Li L, Wang D, Du W et al (2022) C-P map: a novel evaluation toolkit for speaker verification. Odyssey 2022:306–313
Article Google Scholar
Li L, Wang D, Wang D (2022d) Pay attention to hard trials. In: 2022 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE, pp 204–209
Martin AF, Greenberg CS (2009) NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels. In: Proceedings of the annual conference of international speech communication association (INTERRSPEECH), pp 2579–2582
Matejka P, Novotnỳ O, Plchot O et al (2017) Analysis of score normalization in multilingual speaker recognition. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 1567–1571
McLaren M, Ferrer L, Castan D et al (2016) The speakers in the wild (SITW) speaker recognition database. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 818–822
Nagrani A, Chung JS, Zisserman A (2017) VoxCeleb: A large-scale speaker identification dataset. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 2616–2620
Nagrani A, Chung JS, Huh J et al (2020a) VoxSRC 2020: the second VoxCeleb speaker recognition challenge. arXiv:2012.06867
Nagrani A, Chung JS, Xie W et al (2020) VoxCeleb: large-scale speaker verification in the wild. Computer Speech & Language 60(101):027
Google Scholar
NIST (2010) NIST 2010 sre evaluation plan. Available online: http://www.nist.gov/itl/iad/mig/upload/NIST_SRE10_evalplan-r6.pdf
Okabe K, Koshinaka T, Shinoda K (2018) Attentive statistics pooling for deep speaker embedding. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 2252–2256
Park TJ, Kanda N, Dimitriadis D et al (2022) A review of speaker diarization: recent advances with deep learning. Computer Speech & Language 72(101):317
Google Scholar
Povey D, Ghoshal A, Boulianne G et al (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (ASRU), IEEE
Snyder D, Garcia-Romero D, Sell G et al (2018) X-vectors: robust DNN embeddings for speaker recognition. In: IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5329–5333
Sun H, Wang D, Li L et al (2023) Random cycle loss and its application to voice conversion. IEEE Trans Pattern Anal Mach Intell
Thienpondt J, Desplanques B, Demuynck K (2021) The IDLAB VoxSRC-20 submission: large margin fine-tuning and quality-aware score calibration in dnn based speaker verification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5814–5818
Tong F, Zhao M, Zhou J et al (2021) ASV-Subtools: open source toolkit for automatic speaker verification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6184–6188
Torgerson WS (1958) Theory and methods of scaling
Tuncer T, Dogan S, Ertam F (2019) Automatic voice based disease detection method using one dimensional local binary pattern feature extraction network. Appl Acoust 155:500–506
Article Google Scholar
Vapnik V (1963) Pattern recognition using generalized portrait method. Autom Remote Control 24:774–780
Google Scholar
Vapnik V (1982) Estimation of dependences based on empirical data. Springer Science & Business Media
Variani E, Lei X, McDermott E et al (2014) Deep neural networks for small footprint text-dependent speaker verification. IEEE international conference on acoustics. Speech and signal processing (ICASSP). IEEE, pp 4052–4056
Villalba J, Chen N, Snyder D et al (2020) State-of-the-art speaker recognition with neural network embeddings in nist sre18 and speakers in the wild evaluations. Computer Speech & Language 60(101):026
Google Scholar
WANG D, Hong Q, Li L et al (2022) Cnsrc 2022 evaluation plan. http://aishell-cnsrc.oss-cn-hangzhou.aliyuncs.com/cnsrc-v2.0.pdf
Wang F, Cheng J, Liu W et al (2018) Additive margin softmax for face verification. IEEE Sig Process Lett 25(7):926–930
Article Google Scholar
Xie W, Nagrani A, Chung JS et al (2019) Utterance-level aggregation for speaker recognition in the wild. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5791–5795
Zheng Y, Chen Y, Peng J et al (2022) The SpeakIn system description for CNSRC 2022. arXiv:2209.10846
Zhou T, Zhao Y, Wu J (2021) ResNeXt and Res2Net structures for speaker verification. In: 2021 IEEE Spoken language technology workshop (SLT). IEEE, pp 301–307
Zhu Y, Ko T, Snyder D et al (2018) Self-attentive speaker embeddings for text-independent speaker verification. In: Proceedings of the annual conference of international speech communication association (INTERSPEECH), pp 3573–3577

Download references

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant No.62171250 and also the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Lantian Li
Key Laboratory of China’s Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Lanzhou, 730030, Gansu, China
Di Wang
Computer and Information Sciences, University of Strathclyde, Glasgow, G1 1XQ, Scotland
Andrew Abel
Center for Speech and Language Technologies (CSLT), BNRist, Tsinghua University, Beijing, 100084, China
Dong Wang

Authors

Lantian Li
View author publications
You can also search for this author in PubMed Google Scholar
Di Wang
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Abel
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.L.: Methodology, Software, Writing - Original Draft; Di W.: Software; A.A.: Writing - Review & Editing; Dong W.: Conceptualization, Supervision, Funding acquisition, Writing - Review & Editing.

Corresponding author

Correspondence to Dong Wang.

Ethics declarations

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent to publication

Participants were informed that the results of their opinions would be published in a way that their identity could not be revealed.

Conflict of interest

The authors of this paper declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, L., Wang, D., Abel, A. et al. On evaluation trials in speaker verification. Appl Intell 54, 113–130 (2024). https://doi.org/10.1007/s10489-023-05071-9

Download citation

Accepted: 28 September 2023
Published: 05 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05071-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On evaluation trials in speaker verification

Abstract

Access this article

Similar content being viewed by others

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Sample-specific late classifier fusion for speaker verification

Data availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On evaluation trials in speaker verification

Abstract

Access this article

Similar content being viewed by others

Evaluating the effects of task design on unfamiliar Francophone listener and automatic speaker identification performance

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Sample-specific late classifier fusion for speaker verification

Data availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publication

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation