Improved i-vector extraction technique for speaker verification with short utterances

Poddar, Arnab; Sahidullah, Md; Saha, Goutam

doi:10.1007/s10772-017-9477-2

Improved i-vector extraction technique for speaker verification with short utterances

Published: 30 November 2017

Volume 21, pages 473–488, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Arnab Poddar¹,
Md Sahidullah² &
Goutam Saha¹

292 Accesses
3 Citations
Explore all metrics

Abstract

A major challenge in ASV is to improve performance with short speech segments for end-user convenience in real-world applications. In this paper, we present a detailed analysis of ASV systems to observe the duration variability effects on state-of-the-art i-vector and classical Gaussian mixture model-universal background model (GMM-UBM) based ASV systems. We observe an increase in uncertainty of model parameter estimation for i-vector based ASV with speech of shorter duration. In order to compensate the effect of duration variability in short utterances, we have proposed adaptation technique for Baum-Welch statistics estimation used to i-vector extraction. Information from pre-estimated background model parameters are used for adaptation method. The ASV performance with the proposed approach is considerably superior to the conventional i-vector based system. Furthermore, the fusion of proposed i-vector based system and GMM-UBM further improves the ASV performance, especially for short speech segments. Experiments conducted on two speech corpora, NIST SRE 2008 and 2010, have shown relative improvement in equal error rate (EER) in the range of 12–20%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive i-Vector Extraction for Speaker Verification with Short Utterance

Sparsity Analysis and Compensation for i-Vector Based Speaker Verification

A study on the roles of total variability space and session variability modeling in speaker recognition

Article 07 December 2015

A. K. Sarkar, J. F. Bonastre & D. Matrouf

Notes

References

Angkititrakul, P., & Hansen, J. H. (2007). Discriminative in-set/out-of-set speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(2), 498–508.
Article Google Scholar
Brummer, N., Burget, L., Cernocky, H., Glembek, O., Grezl, F., Karafiat, M., et al. (2007). Fusion of heterogeneous speaker recognition systems in the SBTU submission for the NIST speaker recognition evaluation 2006. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2072–2084.
Article Google Scholar
Cai, W., Li, M., Li, L., & Hong, Q. (2015). Duration dependent covariance regularization in plda modeling for speaker verification. In INTERSPEECH (pp. 1027–1031).
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006a). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.
Article Google Scholar
Campbell, W. M., Sturim, D. E., Reynolds, D. A., & Solomonoff, A. (2006b). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP), IEEE.
Campbell, J. P, Jr. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.
Article Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1–38.
MathSciNet MATH Google Scholar
Fauve, B. G., Evans, N. W., Pearson, N., Bonastre, J. F., & Mason, J. S. (2007). Influence of task duration in text-independent speaker verification. In Proceedings of INTERSPEECH, ISCA (pp. 794–797).
Fauve, B. G., Evans, N. W., & Mason, J. S. (2008). Improving the performance of text-independent short duration SVM-and GMM-based speaker verification. In Odyssey, ISCA (p. 18).
Ferrer, L., Bratt, H., Kajarekar, S., Shriberg, E., Sönmez, K., Stolcke, A., & Venkataraman, A. (2003). Modeling duration patterns for speaker recognition (pp. 2017–2020).
Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–298.
Article Google Scholar
Hasan, T., & Hansen, J. H. (2011). A study on universal background model training in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 1890–1899.
Article Google Scholar
Hasan, T., Saeidi, R., & Hansen, J. H., van Leeuwen, D. (2013). Duration mismatch compensation for i-vector based speaker recognition systems. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE (pp. 7663–7667).
Kanagasundaram, A., Vogt, R., Dean, D. B., Sridharan, S., & Mason, M. W. (2011). I-vector based speaker recognition on short utterances. In Proceedings of INTERSPEECH, ISCA (pp. 2341–2344).
Kanagasundaram, A., Vogt, R. J., Dean, D. B., & Sridharan, S. (2012). PLDA based speaker recognition on short utterances. In The speaker and language recognition workshop (Odyssey) ISCA.
Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Gonzalez-Rodriguez, J., & Ramos, D. (2014). Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Communication, 59, 69–82.
Article Google Scholar
Kanagasundaram, A., Dean, D., Sridharan, S., Ghaemmaghami, H., & Fookes, C. (2017). A study on the effects of using short utterance length development data in the design of gplda speaker verification systems. International Journal of Speech Technology, 20(2), 247–259.
Article Google Scholar
Kenny, P. (2010). Bayesian speaker verification with heavy-tailed priors. In The speaker and language recognition workshop (Odyssey) ISCA, (pp. 14).
Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1435–1447.
Article Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 980–988.
Article Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Krishnamoorthy, P., Jayanna, H., & Prasanna, S. (2011). Speaker recognition under limited data condition by noise addition. Expert Systems with Applications, 38(10), 13,487–13,490.
Article Google Scholar
Li, L., Wang, D., Zhang, C., & Zheng, T. F. (2016a). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(6), 1129–1139.
Article Google Scholar
Li, L., Wang, D., Zhang, X., Zheng, T. F., & Jin, P. (2016b). System combination for short utterance speaker recognition. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, IEEE, (pp. 1–5).
Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech & Language, 28(4), 940–958.
Article Google Scholar
Li, W., Fu, T., You, H., Zhu, J., & Chen, N. (2016c). Feature sparsity analysis for i-vector based speaker verification. Speech Communication, 80, 60–70.
Article Google Scholar
Mandasari, M.I., McLaren, M., & van Leeuwen, D. A. (2011). Evaluation of i-vector speaker recognition systems for forensic application. In Proceedings of INTERSPEECH, ISCA (pp. 21–24).
NIST. (2008). The NIST year 2008 speaker recognition evaluation plan. Technical report, NIST.
NIST. (2010). The NIST year 2010 speaker recognition evaluation plan. Technical report, NIST.
Poddar, A., Sahidullah, M., & Saha, G. (2015). Performance comparison of speaker recognition systems in presence of duration variability. In Annual IEEE India Conference (INDICON), IEEE (pp. 1–6).
Poddar, A., Sahidullah, M., & Saha, G. (2017). An adaptive i-vector extraction for speaker verification with short utterance. In Proc. of International Conference on Pattern Recognition and Machine Intelligence (PReMI 2017), Berlin: Springer.
Poorjam, A. H., Saeidi, R., Kinnunen, T., & Hautamäki, V. (2016). Incorporating uncertainty as a quality measure in i-vector based language recognition. Odyssey pp. 74–80.
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE transactions on speech and audio processing, 3(1), 72–83.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.
Article Google Scholar
Sahidullah, M., & Kinnunen, T. (2016). Local spectral variability features for speaker verification. Digital Signal Processing, 50, 1–11.
Article Google Scholar
Sahidullah, M., & Saha, G. (2012a). Comparison of speech activity detection techniques for speaker recognition. arXiv preprint arXiv:12100297
Sahidullah, M., & Saha, G. (2012b). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543–565.
Article Google Scholar
Sahidullah, M., & Saha, G. (2013). A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Processing Letters, 20(2), 149–152.
Article Google Scholar
Sarkar, A. K., Matrouf, D., Bousquet, P. M., & Bonastre, J. F. (2012). Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In Proceedings of INTERSPEECH ISCA.
Shum, S. (2011). Unsupervised methods for speaker diarization. PhD thesis, Massachusetts Institute of Technology.
Suh, J. W., & Hansen, J. H. (2012). Acoustic hole filling for sparse enrollment data using a cohort universal corpus for speaker recognition. The Journal of the Acoustical Society of America, 131(2), 1515–1528.
Article Google Scholar
Van Segbroeck, M., Travadi, R., & Narayanan, S. S. (2015). Rapid language identification. IEEE Transactions on Audio, Speech, and Language Processing, 23(7), 1118–1129.
Article Google Scholar

Download references

Acknowledgements

The authors would like thank Indian Space Research Organization (ISRO) for partial funding of the research outcome. The authors would also like to express gratitude to the lab members of (Audio and Bio-Signal Processing) ABSP Lab, especially Mr. Monisankha Pal and Mrs. Shefali Waldekar for mindful discussions and co-operation.

Author information

Authors and Affiliations

Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, 721302, India
Arnab Poddar & Goutam Saha
Speech and Image Processing Unit, School of Computing, University of Eastern Finland, 80101, Joensuu, Finland
Md Sahidullah

Authors

Arnab Poddar
View author publications
You can also search for this author in PubMed Google Scholar
Md Sahidullah
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Saha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Poddar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Poddar, A., Sahidullah, M. & Saha, G. Improved i-vector extraction technique for speaker verification with short utterances. Int J Speech Technol 21, 473–488 (2018). https://doi.org/10.1007/s10772-017-9477-2

Download citation

Received: 29 June 2017
Accepted: 02 November 2017
Published: 30 November 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-017-9477-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved i-vector extraction technique for speaker verification with short utterances

Abstract

Access this article

Similar content being viewed by others

An Adaptive i-Vector Extraction for Speaker Verification with Short Utterance

Sparsity Analysis and Compensation for i-Vector Based Speaker Verification

A study on the roles of total variability space and session variability modeling in speaker recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved i-vector extraction technique for speaker verification with short utterances

Abstract

Access this article

Similar content being viewed by others

An Adaptive i-Vector Extraction for Speaker Verification with Short Utterance

Sparsity Analysis and Compensation for i-Vector Based Speaker Verification

A study on the roles of total variability space and session variability modeling in speaker recognition

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation