Skip to main content
Log in

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we investigate the use of Multiple Background Models (M-BMs) in Speaker Verification (SV). We cluster the speakers using either their Vocal Tract Lengths (VTLs) or by using their speaker specific Maximum Likelihood Linear Regression (MLLR) super-vector, and build a separate Background Model (BM) for each such cluster. We show that the use of M-BMs provide improved performance when compared to the use of a single/gender wise Universal Background Model (UBM). While the computational complexity during test remains same for both M-BMs and UBM, M-BMs require switching of models depending on the claimant and also score-normalization becomes difficult. To overcome these problems, we propose a novel method which aggregates the information from Multiple Background Models into a single gender independent UBM and is inspired by conventional Feature Mapping (FM) technique. We show that using this approach, we get improvement over the conventional UBM method, and yet this approach also permits easy use of score-normalization techniques. The proposed method provides relative improvement in Equal-Error Rate (EER) by 13.65 % in the case of VTL clustering, and 15.43 % in the case of MLLR super-vector when compared to the conventional single UBM system. When AT-norm score-normalization is used then the proposed method provided a relative improvement in EER of 20.96 % for VTL clustering and 22.48 % for MLLR super-vector based clustering. Furthermore, the proposed method is compared with the gender dependent speaker verification system using Gaussian Mixture Model-Support Vector Machines (GMM-SVM) super-vector linear kernel. The experimental results show that the proposed method perform better than gender dependent speaker verification system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 2
Fig. 5
Algorithm 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Algorithm 4
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. http://lia.univ-avignon.fr/heberges/alize/.

References

  • Akhil, P. T., Rath, S. P., Umesh, S., & Sanand, D. R. (2008). A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics. In Proc. of interspeech (pp. 1713–1716).

    Google Scholar 

  • Ariyaeeinia, A. M., & Sivakumaran, P. (1997). Analysis and comparison of score normalization methods for text dependent speaker verification. In Proc. of Eur. conf. speech commun. and tech. (Eurospeech) (pp. 1379–1382).

    Google Scholar 

  • Auckenthaler, R., Carey, M., & Lloyd-Thomas, H. (2000). Score normalization for text-independent speaker verification system. Digital Signal Processing, 10, 42–54.

    Article  Google Scholar 

  • Bar-Yosef, Y., & Bistritz, Y. (2009). Adaptive individual background model for speaker verification. In Proc. of interspeech (pp. 1271–1274).

    Google Scholar 

  • Bonastre, J. F., Scheffer, N., Fredouille, C., & Matrouf, D. (2004). Nist’04 speaker recognition evaluation campaign: new LIA speaker detection plateform based on ALIZE toolkit. In Proc. of NIST 2004 speaker recognition workshop.

    Google Scholar 

  • Campbell, W., Sturim, D., Reynolds, D., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 97–100).

    Google Scholar 

  • Castro, D. R., et al. (2007). Speaker verification using speaker- and test-dependent fast score normalization. Pattern Recognition Letters, 28, 90–98.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  • Ferras, M., Leung, C. C., Barras, C., & Gauvain, J. L. (2007). Constrained MLLR for speaker recognition. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 53–56).

    Google Scholar 

  • Goldberger, J., & Aronowitz, H. (2005). A distance measure between GMMs based on the unscented transform and its application to speaker recognition. In Proc. of interspeech (pp. 1985–1989).

    Google Scholar 

  • Isobe, T., & Takahashi, J. (1999). A new cohort normalization using local acoustic information for speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 841–844).

    Google Scholar 

  • Kenny, P. (2006). Joint factor analysis of speaker and session variability: theory and algorithms (Technical report CRIM-06/08-13). Montreal, CRIM.

  • Lee, L., & Rose, R. (1998). Frequency warping approach to speaker normalization. IEEE Transactions on Speech and Audio Processing, 6, 49–59.

    Article  Google Scholar 

  • Leggetter, C., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of Hmms. Computer Speech & Language, 9, 171–186.

    Article  Google Scholar 

  • Martin, A., Doddington, G., Kamm, T., Ordowskiand, M., & Przybocki, M. (1997). The det curve in assessment of detection task performance. In Proc. of Eur. conf. speech commun. and tech, Eurospeech (pp. 1895–1898).

    Google Scholar 

  • Mason, M., Vogt, R., Baker, B., & Sridharan, S. (2005). Data-driven clustering for blind feature mapping in speaker verification. In Proc. of interspeech (pp. 3109–3112).

    Google Scholar 

  • Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.

    Article  Google Scholar 

  • Reynolds, D. A. (2003). Channel robust speaker verification via feature mapping. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 6–10).

    Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Rosenberg, A. E., & Parthasarathy, S. (1996). Speaker background models for connected digit password speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 81–84).

    Google Scholar 

  • Rosenberg, A. E., DeLong, J., Lee, C. -H., Jaung, B. -H., & Soong, F. K. (1992). The use of cohort normalized scores for speaker verification. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 599–602).

    Google Scholar 

  • Sanand, D. R., & Umesh, S. (2008). Study of Jacobian compensation using linear transformation of conventional MFCC for VTLN. In Proc. of interspeech (pp. 1233–1236).

    Google Scholar 

  • Sarkar, A. K., & Umesh, S. (2010). Investigation of speaker-clustered UBMs based on vocal tract lengths and MLLR matrices for speaker verification. In Proc. of Odyssey speaker and language recognition workshop (pp. 286–293).

    Google Scholar 

  • Sarkar, A. K., & Umesh, S. (2011). Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 4552–4555).

    Chapter  Google Scholar 

  • Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., & Venkataraman, A. (2005). MLLR transforms as features in speaker recognition. In Proc. of Eur. conf. speech commun. and tech, Eurospeech (pp. 2425–2428).

    Google Scholar 

  • Sturim, D. E., & Reynolds, D. (2005). Speaker adaptive cohort selection for t-norm in text-independent speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 741–744).

    Chapter  Google Scholar 

  • Teunen, R., Shahshahani, B., & Heck, L. (2000). A model-based transformational approach to robust speaker recognition. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 495–498).

    Google Scholar 

  • The Evaluation Plan of NIST 2004 Speaker Recognition Campaign (2004). http://www.itl.nist.gov/iad/mig//tests/sre/2004/SRE04_evalplan-v1a.pdf.

  • Tran, D., & Wagner, M. (2000). A proposed likelihood transformation for speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 1069–1072).

    Google Scholar 

  • Vuuren, S. V., & Hermansky, H. (1998). On the importance of components of the modulation spectrum for speaker verification. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 3205–3208).

    Google Scholar 

  • Zhang, W. Q., Shan, Y., & Liu, J. (2010). Multiple background models for speaker verification. In Proc. of Odyssey speaker and language recognition workshop (pp. 47–51).

    Google Scholar 

Download references

Acknowledgements

A part of this work was supported by SERC project fund SR/S3/EECE/058/2008 from the Department of Science and Technology, Ministry of Science and Technology, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. K. Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, A.K., Umesh, S. Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector. Int J Speech Technol 15, 351–364 (2012). https://doi.org/10.1007/s10772-012-9149-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9149-1

Keywords

Navigation