Text-dependent short duration speaker verification involves two challenges. The primary challenge of interest is the verification of the speaker’s identity, and often a secondary challenge of interest is the verification of the lexical content of the pass-phrase. In this paper, we propose the use of two systems to handle these two tasks in parallel with one sub-system modelling speaker identity based on the assumption that lexical content is known and the other sub-system modelling lexical content in a speaker dependent manner. The text-dependent speaker verification sub-system is based on hidden Markov models and the lexical content verification system is based on models of speech segments that use a distinct Gaussian mixture model for each segment. Furthermore, a mixture selection method based on KL divergence was applied to refine the lexical content sub-system by making the models more discriminative. Experiments on part 1 of the RedDots database showed that the proposed combination of two sub-systems outperformed the baseline system by 39.8%, 51.1% and 37.3% in terms of the ‘imposter_correct’, ‘target_wrong’ and ‘imposter_wrong’ metrics respectively.
Cite as: Ma, J., Irtza, S., Sriskandaraja, K., Sethu, V., Ambikairajah, E. (2016) Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification. Proc. Interspeech 2016, 435-439, doi: 10.21437/Interspeech.2016-825
@inproceedings{ma16_interspeech, author={Jianbo Ma and Saad Irtza and Kaavya Sriskandaraja and Vidhyasaharan Sethu and Eliathamby Ambikairajah}, title={{Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={435--439}, doi={10.21437/Interspeech.2016-825} }