Abstract
One of the most difficult challenges for speaker recognition is dealing with channel variability. In this paper, several new cross-channel compensation techniques are introduced for a Gaussian mixture model—universal background model (GMM-UBM) speaker verification system. These new techniques include wideband noise reduction, echo cancellation, a simplified feature-domain latent factor analysis (LFA) and data-driven score normalization. A novel dynamic Gaussian selection algorithm is developed to reduce the feature compensation time by more than 60% without any performance loss. The performance of different techniques across varying channel train/test conditions are presented and discussed, finding that speech enhancement, which used to be neglected for telephone speech, is essential for cross-channel tasks, and the channel compensation techniques developed for telephone channel speech also perform effectively. The per microphone performance analysis further shows that speech enhancement can boost the effects of other techniques greatly, especially on channels with larger signal-to-noise ratio (SNR) variance. All results are presented on NIST SRE 2006 and 2008 data, showing a promising performance gain compared to the baseline. The developed system is also compared with other state-of-the-art speaker verification systems. The result shows that the developed system can obtain comparable or even better performance but consumes much less CPU time, making it more suitable for practical use.





Similar content being viewed by others
References
Burget L, Matějka P, Schwarz P (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Transactions on Audio, Speech and Language Processing 15(7):1979–1986
Campbell WM, Campbell JP, Reynolds DA, Jones DA, Leek TR (2004) Phonetic speaker recognition with support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT, Cambridge
Campbell B, Sturim D, Shen W, Reynolds D, Navratil J (2006) MIT Lincoln laboratory system description NIST SRE 2006, MIT Lincoln Laboratory
Dehak N, Kenny P, Dumouchel P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing 15(7):2095–2103
Ferrer L, Shriberg E, Kajarekar S, Stolcke A, Sönmez K, Venkataraman A, Bratt H (2006) The contribution of cepstral and stylistic features to SRIs 2005 NIST speaker recognition evaluation system, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 101–104
Ferrer L, Graciarena M, Zymnis A, Shriberg E (2008) System combination using auxiliary information for speaker verification, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4853–4856
He L, Zhang W-Q, Shan Y, Liu J (2008) Channel compensation technology in differential GSV-SVM speaker verification system, in Proc. IEEE Asia Pacific Conference on Circuits and Systems
Hou T, Liu J (to be published) Vector angle minimum criteria for classifier selection in speaker verification technology. Chinese Journal of Electronics
ITU (1996) G.723.1 Annex A. Speech coders: silence compression scheme. ITU-T, Geneva
Kajarekar S, Ferrer L, Shriberg E, Sönmez K, Stolcke A, Venkataraman A, Zheng J (2005) SRI’s 2004 NIST speaker recognition evaluation system, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp 173–176
Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 16(5):980–988
Linguistic Data Consortium (2008) The Mixer 4 and 5 Corpora supporting SRE 08, in NIST Speaker Recognition Workshop, Montreal, Canada
Lucey S, Chen T (2003) Improved speaker verification through probabilistic subspace adaptation, in Proc. Eurospeech-2003, pp. 2021–2024
National Institute of Standards and Technology (2008) 2008 NIST speaker recognition evaluation review, in NIST Speaker Recognition Workshop, Montreal, Canada
NIST (2008) The NIST Year 2008 speaker recognition evaluation plan, National Institute of Standards and Technology
Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification, in Proc. A Speaker Odyssey, Crete, Grece, pp. 213–218
Pellom BL, Sarikaya R, Hansen JHL (2001) Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition. IEEE signal processing letters 8(8)
Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10:19–41
Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The SuperSID project: exploiting high-level information for high-accuracy speaker recognition, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 784–787
Stolcke A, Kajarekar S, Ferrer L, Shriberg E (2007) Speaker recognition with session variability normalization based on MLLR adaptation transforms. IEEE Transactions on Audio, Speech and Language Processing 15(7):1987–1998
Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for Tnorm in textindependent speaker verification, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
Sturim DE, Campbell WM, Reynolds DA, Dunn RB, Quatieri TF (2007) Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE Auxiliary Microphone Task, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analysis. Neural Computation 11(2):443–482
Vair C, Colibro D, Castaldo F, Dalmasso E, Laface P (2006) Channel factors compensation in model and feature domain for speaker recognition, in Speaker Odyssey Workshop
Acknowledgements
This work was supported by the National Natural Science Foundation of China and Microsoft Research Asia under Grant No. 60776800, and in part by the National High Technology Development Program of China (863 Program) under Grant No. 2006AA010101, No. 2007AA04Z223 and No. 2008AA02Z414. We thank Liang He and Shan Zhong from Tsinghua University for giving us their experimental results of DGSV-SVM system and MLLR-SVM system. We would also like to thank Dr. Michael T. Johnson from Marquette University and other anonymous reviewers for their comments that helped to improve the content of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shan, Y., Liu, J. Robust speaker recognition in cross-channel condition based on Gaussian mixture model. Multimed Tools Appl 52, 159–173 (2011). https://doi.org/10.1007/s11042-009-0456-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-009-0456-8