Skip to main content
Log in

Robust speaker recognition in cross-channel condition based on Gaussian mixture model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

One of the most difficult challenges for speaker recognition is dealing with channel variability. In this paper, several new cross-channel compensation techniques are introduced for a Gaussian mixture model—universal background model (GMM-UBM) speaker verification system. These new techniques include wideband noise reduction, echo cancellation, a simplified feature-domain latent factor analysis (LFA) and data-driven score normalization. A novel dynamic Gaussian selection algorithm is developed to reduce the feature compensation time by more than 60% without any performance loss. The performance of different techniques across varying channel train/test conditions are presented and discussed, finding that speech enhancement, which used to be neglected for telephone speech, is essential for cross-channel tasks, and the channel compensation techniques developed for telephone channel speech also perform effectively. The per microphone performance analysis further shows that speech enhancement can boost the effects of other techniques greatly, especially on channels with larger signal-to-noise ratio (SNR) variance. All results are presented on NIST SRE 2006 and 2008 data, showing a promising performance gain compared to the baseline. The developed system is also compared with other state-of-the-art speaker verification systems. The result shows that the developed system can obtain comparable or even better performance but consumes much less CPU time, making it more suitable for practical use.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Burget L, Matějka P, Schwarz P (2007) Analysis of feature extraction and channel compensation in a GMM speaker recognition system. IEEE Transactions on Audio, Speech and Language Processing 15(7):1979–1986

    Article  Google Scholar 

  2. Campbell WM, Campbell JP, Reynolds DA, Jones DA, Leek TR (2004) Phonetic speaker recognition with support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT, Cambridge

    Google Scholar 

  3. Campbell B, Sturim D, Shen W, Reynolds D, Navratil J (2006) MIT Lincoln laboratory system description NIST SRE 2006, MIT Lincoln Laboratory

  4. Dehak N, Kenny P, Dumouchel P (2007) Modeling prosodic features with joint factor analysis for speaker verification. IEEE Transactions on Audio, Speech and Language Processing 15(7):2095–2103

    Article  Google Scholar 

  5. Ferrer L, Shriberg E, Kajarekar S, Stolcke A, Sönmez K, Venkataraman A, Bratt H (2006) The contribution of cepstral and stylistic features to SRIs 2005 NIST speaker recognition evaluation system, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 101–104

  6. Ferrer L, Graciarena M, Zymnis A, Shriberg E (2008) System combination using auxiliary information for speaker verification, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4853–4856

  7. He L, Zhang W-Q, Shan Y, Liu J (2008) Channel compensation technology in differential GSV-SVM speaker verification system, in Proc. IEEE Asia Pacific Conference on Circuits and Systems

  8. Hou T, Liu J (to be published) Vector angle minimum criteria for classifier selection in speaker verification technology. Chinese Journal of Electronics

  9. ITU (1996) G.723.1 Annex A. Speech coders: silence compression scheme. ITU-T, Geneva

    Google Scholar 

  10. Kajarekar S, Ferrer L, Shriberg E, Sönmez K, Stolcke A, Venkataraman A, Zheng J (2005) SRI’s 2004 NIST speaker recognition evaluation system, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp 173–176

  11. Kenny P, Ouellet P, Dehak N, Gupta V, Dumouchel P (2008) A study of inter-speaker variability in speaker verification. IEEE Transactions on Audio, Speech and Language Processing 16(5):980–988

    Article  Google Scholar 

  12. Linguistic Data Consortium (2008) The Mixer 4 and 5 Corpora supporting SRE 08, in NIST Speaker Recognition Workshop, Montreal, Canada

  13. Lucey S, Chen T (2003) Improved speaker verification through probabilistic subspace adaptation, in Proc. Eurospeech-2003, pp. 2021–2024

  14. National Institute of Standards and Technology (2008) 2008 NIST speaker recognition evaluation review, in NIST Speaker Recognition Workshop, Montreal, Canada

  15. NIST (2008) The NIST Year 2008 speaker recognition evaluation plan, National Institute of Standards and Technology

  16. Pelecanos J, Sridharan S (2001) Feature warping for robust speaker verification, in Proc. A Speaker Odyssey, Crete, Grece, pp. 213–218

  17. Pellom BL, Sarikaya R, Hansen JHL (2001) Fast likelihood computation techniques in nearest-neighbor based search for continuous speech recognition. IEEE signal processing letters 8(8)

  18. Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10:19–41

    Article  Google Scholar 

  19. Reynolds D, Andrews W, Campbell J, Navratil J, Peskin B, Adami A, Jin Q, Klusacek D, Abramson J, Mihaescu R, Godfrey J, Jones D, Xiang B (2003) The SuperSID project: exploiting high-level information for high-accuracy speaker recognition, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 784–787

  20. Stolcke A, Kajarekar S, Ferrer L, Shriberg E (2007) Speaker recognition with session variability normalization based on MLLR adaptation transforms. IEEE Transactions on Audio, Speech and Language Processing 15(7):1987–1998

    Article  Google Scholar 

  21. Sturim DE, Reynolds DA (2005) Speaker adaptive cohort selection for Tnorm in textindependent speaker verification, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing

  22. Sturim DE, Campbell WM, Reynolds DA, Dunn RB, Quatieri TF (2007) Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE Auxiliary Microphone Task, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing

  23. Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analysis. Neural Computation 11(2):443–482

    Article  Google Scholar 

  24. Vair C, Colibro D, Castaldo F, Dalmasso E, Laface P (2006) Channel factors compensation in model and feature domain for speaker recognition, in Speaker Odyssey Workshop

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China and Microsoft Research Asia under Grant No. 60776800, and in part by the National High Technology Development Program of China (863 Program) under Grant No. 2006AA010101, No. 2007AA04Z223 and No. 2008AA02Z414. We thank Liang He and Shan Zhong from Tsinghua University for giving us their experimental results of DGSV-SVM system and MLLR-SVM system. We would also like to thank Dr. Michael T. Johnson from Marquette University and other anonymous reviewers for their comments that helped to improve the content of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shan, Y., Liu, J. Robust speaker recognition in cross-channel condition based on Gaussian mixture model. Multimed Tools Appl 52, 159–173 (2011). https://doi.org/10.1007/s11042-009-0456-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-009-0456-8

Keywords

Navigation