Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

Sarkar, A. K.; Umesh, S.

doi:10.1007/s10772-012-9149-1

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

Published: 15 June 2012

Volume 15, pages 351–364, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

A. K. Sarkar^1,2 &
S. Umesh¹

225 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we investigate the use of Multiple Background Models (M-BMs) in Speaker Verification (SV). We cluster the speakers using either their Vocal Tract Lengths (VTLs) or by using their speaker specific Maximum Likelihood Linear Regression (MLLR) super-vector, and build a separate Background Model (BM) for each such cluster. We show that the use of M-BMs provide improved performance when compared to the use of a single/gender wise Universal Background Model (UBM). While the computational complexity during test remains same for both M-BMs and UBM, M-BMs require switching of models depending on the claimant and also score-normalization becomes difficult. To overcome these problems, we propose a novel method which aggregates the information from Multiple Background Models into a single gender independent UBM and is inspired by conventional Feature Mapping (FM) technique. We show that using this approach, we get improvement over the conventional UBM method, and yet this approach also permits easy use of score-normalization techniques. The proposed method provides relative improvement in Equal-Error Rate (EER) by 13.65 % in the case of VTL clustering, and 15.43 % in the case of MLLR super-vector when compared to the conventional single UBM system. When AT-norm score-normalization is used then the proposed method provided a relative improvement in EER of 20.96 % for VTL clustering and 22.48 % for MLLR super-vector based clustering. Furthermore, the proposed method is compared with the gender dependent speaker verification system using Gaussian Mixture Model-Support Vector Machines (GMM-SVM) super-vector linear kernel. The experimental results show that the proposed method perform better than gender dependent speaker verification system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Milestones in speaker recognition

Article Open access 15 February 2024

R. Sharma, D. Govind, … S. R. M. Prasanna

Databases, features and classifiers for speech emotion recognition: a review

Article 19 January 2018

Monorama Swain, Aurobinda Routray & P. Kabisatpathy

A new detection method for EMG activity monitoring

Article 17 December 2019

Hichem Bengacemi, Karim Abed-Meraim, … Ammar Mesloub

Notes

http://lia.univ-avignon.fr/heberges/alize/.

References

Akhil, P. T., Rath, S. P., Umesh, S., & Sanand, D. R. (2008). A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics. In Proc. of interspeech (pp. 1713–1716).
Google Scholar
Ariyaeeinia, A. M., & Sivakumaran, P. (1997). Analysis and comparison of score normalization methods for text dependent speaker verification. In Proc. of Eur. conf. speech commun. and tech. (Eurospeech) (pp. 1379–1382).
Google Scholar
Auckenthaler, R., Carey, M., & Lloyd-Thomas, H. (2000). Score normalization for text-independent speaker verification system. Digital Signal Processing, 10, 42–54.
Article Google Scholar
Bar-Yosef, Y., & Bistritz, Y. (2009). Adaptive individual background model for speaker verification. In Proc. of interspeech (pp. 1271–1274).
Google Scholar
Bonastre, J. F., Scheffer, N., Fredouille, C., & Matrouf, D. (2004). Nist’04 speaker recognition evaluation campaign: new LIA speaker detection plateform based on ALIZE toolkit. In Proc. of NIST 2004 speaker recognition workshop.
Google Scholar
Campbell, W., Sturim, D., Reynolds, D., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 97–100).
Google Scholar
Castro, D. R., et al. (2007). Speaker verification using speaker- and test-dependent fast score normalization. Pattern Recognition Letters, 28, 90–98.
Article Google Scholar
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
MathSciNet MATH Google Scholar
Ferras, M., Leung, C. C., Barras, C., & Gauvain, J. L. (2007). Constrained MLLR for speaker recognition. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 53–56).
Google Scholar
Goldberger, J., & Aronowitz, H. (2005). A distance measure between GMMs based on the unscented transform and its application to speaker recognition. In Proc. of interspeech (pp. 1985–1989).
Google Scholar
Isobe, T., & Takahashi, J. (1999). A new cohort normalization using local acoustic information for speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 841–844).
Google Scholar
Kenny, P. (2006). Joint factor analysis of speaker and session variability: theory and algorithms (Technical report CRIM-06/08-13). Montreal, CRIM.
Lee, L., & Rose, R. (1998). Frequency warping approach to speaker normalization. IEEE Transactions on Speech and Audio Processing, 6, 49–59.
Article Google Scholar
Leggetter, C., & Woodland, P. (1995). Maximum likelihood linear regression for speaker adaptation of Hmms. Computer Speech & Language, 9, 171–186.
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowskiand, M., & Przybocki, M. (1997). The det curve in assessment of detection task performance. In Proc. of Eur. conf. speech commun. and tech, Eurospeech (pp. 1895–1898).
Google Scholar
Mason, M., Vogt, R., Baker, B., & Sridharan, S. (2005). Data-driven clustering for blind feature mapping in speaker verification. In Proc. of interspeech (pp. 3109–3112).
Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Reynolds, D. A. (2003). Channel robust speaker verification via feature mapping. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 6–10).
Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Rosenberg, A. E., & Parthasarathy, S. (1996). Speaker background models for connected digit password speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 81–84).
Google Scholar
Rosenberg, A. E., DeLong, J., Lee, C. -H., Jaung, B. -H., & Soong, F. K. (1992). The use of cohort normalized scores for speaker verification. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 599–602).
Google Scholar
Sanand, D. R., & Umesh, S. (2008). Study of Jacobian compensation using linear transformation of conventional MFCC for VTLN. In Proc. of interspeech (pp. 1233–1236).
Google Scholar
Sarkar, A. K., & Umesh, S. (2010). Investigation of speaker-clustered UBMs based on vocal tract lengths and MLLR matrices for speaker verification. In Proc. of Odyssey speaker and language recognition workshop (pp. 286–293).
Google Scholar
Sarkar, A. K., & Umesh, S. (2011). Use of VTL-wise models in feature-mapping framework to achieve performance of multiple-background models in speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 4552–4555).
Chapter Google Scholar
Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., & Venkataraman, A. (2005). MLLR transforms as features in speaker recognition. In Proc. of Eur. conf. speech commun. and tech, Eurospeech (pp. 2425–2428).
Google Scholar
Sturim, D. E., & Reynolds, D. (2005). Speaker adaptive cohort selection for t-norm in text-independent speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 741–744).
Chapter Google Scholar
Teunen, R., Shahshahani, B., & Heck, L. (2000). A model-based transformational approach to robust speaker recognition. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 495–498).
Google Scholar
The Evaluation Plan of NIST 2004 Speaker Recognition Campaign (2004). http://www.itl.nist.gov/iad/mig//tests/sre/2004/SRE04_evalplan-v1a.pdf.
Tran, D., & Wagner, M. (2000). A proposed likelihood transformation for speaker verification. In Proc. of IEEE int. conf. acoust. speech signal processing (ICASSP) (pp. 1069–1072).
Google Scholar
Vuuren, S. V., & Hermansky, H. (1998). On the importance of components of the modulation spectrum for speaker verification. In Proc. of int. conf. spoken language processing (ICSLP) (pp. 3205–3208).
Google Scholar
Zhang, W. Q., Shan, Y., & Liu, J. (2010). Multiple background models for speaker verification. In Proc. of Odyssey speaker and language recognition workshop (pp. 47–51).
Google Scholar

Download references

Acknowledgements

A part of this work was supported by SERC project fund SR/S3/EECE/058/2008 from the Department of Science and Technology, Ministry of Science and Technology, India.

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, India
A. K. Sarkar & S. Umesh
LIA, Universite D’Avignon, Avignon, France
A. K. Sarkar

Authors

A. K. Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
S. Umesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. K. Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, A.K., Umesh, S. Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector. Int J Speech Technol 15, 351–364 (2012). https://doi.org/10.1007/s10772-012-9149-1

Download citation

Received: 02 March 2012
Accepted: 25 May 2012
Published: 15 June 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10772-012-9149-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

Abstract

Access this article

Similar content being viewed by others

Milestones in speaker recognition

Databases, features and classifiers for speech emotion recognition: a review

A new detection method for EMG activity monitoring

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector

Abstract

Access this article

Similar content being viewed by others

Milestones in speaker recognition

Databases, features and classifiers for speech emotion recognition: a review

A new detection method for EMG activity monitoring

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation