This paper presents speaker recognition (SR) systems for the text-independent
speaker verification under the cross-lingual (English vs Persian) task
(task 2) of the Short-duration Speaker Verification Challenge (SdSVC)
2021.
We present the description of applied ResNet-like and ECAPA-TDNN-like
topology design solutions as well as an analysis of multi-session scoring
techniques benchmarked on the SdSVC challenge datasets. We overview
various modifications of the basic ResNet-like architecture and training
strategies, allowing us to obtain the improved quality of speaker verification.
Also, we introduce the alpha query expansion-based technique (αQE)
to the enrollment embeddings aggregation at test time, which results
in a 0.042 minDCF improvement from 0.12 to 0.078 for the ECAPA-TDNN
system compared to the embeddings mean. We also propose a trial-level
distance-based non-parametric imposter/target detector (KrTC) used
to filter out the worst enrollment samples at test time to further
improve the performance of the system.
Cite as: Gusev, A., Vinogradova, A., Novoselov, S., Astapov, S. (2021) SdSVC Challenge 2021: Tips and Tricks to Boost the Short-Duration Speaker Verification System Performance. Proc. Interspeech 2021, 2307-2311, doi: 10.21437/Interspeech.2021-1737
@inproceedings{gusev21_interspeech, author={Aleksei Gusev and Alisa Vinogradova and Sergey Novoselov and Sergei Astapov}, title={{SdSVC Challenge 2021: Tips and Tricks to Boost the Short-Duration Speaker Verification System Performance}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={2307--2311}, doi={10.21437/Interspeech.2021-1737} }