Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments

Chakroun, Rania; Frikha, Mondher

doi:10.1007/s11042-020-08824-7

Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments

Published: 03 May 2020

Volume 79, pages 21279–21298, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rania Chakroun^1,2 &
Mondher Frikha^1,3

360 Accesses
6 Citations
Explore all metrics

Abstract

Automatic speaker recognition has emerged as an important technology for voice-based biometric systems. However, text-independent speaker recognition against short utterances remains a challenging task despite of recent advances in the domain of speaker recognition. The presence of background noise presents another critical issue in this field. In this paper, we propose effective features for speaker identification with short utterances, which perform well in both clean and noisy conditions. Speaker identification performance for utterances having very short training and testing durations are presented which provide a clearer description of the proposed system performance. Te proposed features have shown strong robustness in these challenging situations and they consistently perform better than the well known MFCC and GFCC features. The efficiency of the proposed approach was thoroughly tested by comparisons with the most recently successful SVM and i-vector PLDA baseline speaker recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust features for text-independent speaker recognition with short utterances

Article 10 March 2020

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

A New Text Independent Speaker Recognition System with Short Utterances Using SVM

References

Campbell WM (2002) Generalized linear discriminant sequence kernels for speaker recognition. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing, pp 161–164
Google Scholar
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machine using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311
Google Scholar
Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880
Google Scholar
Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5415–5419
Chow D, Abdulla W (2004) Robust speaker identification based on perceptual log area ratio and Gaussian mixture models. In: Eighth International Conference on Spoken Language Processing
Google Scholar
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (May 2010) Front-end factor analysis for speaker verification. IEEE Trans Audio, Speech, and Lang Process 19(99):788–798
Google Scholar
Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332
Google Scholar
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301–2314
Google Scholar
Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: 2012 international conference on systems and informatics (ICSAI2012). IEEE, pp 1746–1750
Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia tools and applications:1–19
Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, informatics and mathematical modeling, Technical University of Denmark, DTU, 2005
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech Corpus CDROM. NIST
Greenberg CS (2012) The NIST year 2012 speaker recognition evaluation plan. NIST, Technical report. [Online]. Available: http://www.nist.gov
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99
Google Scholar
Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth Annual Conference of the International Speech Communication Association
Google Scholar
Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS One 11(7):e0158520
Google Scholar
Jayanna HS, Prasanna SM (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204
Google Scholar
Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association. International Speech Communication Association (ISCA), pp 2341–2344
Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The Speaker and Language Recognition Workshop (Odyssey 2012). ISCA
Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1665–1669
Kanagasundaram A, Dean D, Sridharan S, Ghaemmaghami H, Fookes C (2017) A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20(2):247–259
Google Scholar
Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122
Google Scholar
Kinnunen T, Karpov E, Franti P (2005) Real-time speaker identification and verification. IEEE Trans Audio Speech Lang Process 14(1):277–288
MATH Google Scholar
Kozlov A, Kudashev O, Matveev Y, Pekhovsky T, Simonchik K, Shulipa A (2013) SVID speaker recognition system for NIST SRE 2012. In: International conference on speech and computer. Springer, Cham, pp 278–285
Google Scholar
Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490
Google Scholar
Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Google Scholar
Li Q, Huang Y (2010) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Google Scholar
Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953
Google Scholar
Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In acoustics, speech and signal processing (ICASSP), 2017 IEEE international conference on (pp. 5230-5234). IEEE.
Liu JC, Leu FY, Lin GL, Susanto H (2018a) An MFCC-based text-independent speaker identification system for access control. Concurr Comput: Pract Experience 30(2)
Liu Z, Wu Z, Li T, Li J, Shen C (2018b) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind inf 14(7):3244–3252
Google Scholar
Manikandan J, Venkataramani B (2011) Evaluation of multiclass support vector machine classifiers using optimum threshold-based pruning technique. IET Signal Process 5(5):506–513
Google Scholar
Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Processing 8(8):860–867
Google Scholar
McLaren M, Vogt R, Baker B, Sridharan S, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Odyssey, p 17
Google Scholar
Motlicek P, Dey S, Madikeri S, Burget L (2015) ‘Employment of Subspace Gaussian Mixture Models in speaker recognition’, Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4445–4449). IEEE
National Institute Of Standards and Technology, NIST (2010) Speaker Recognition Evaluation Plan ,Available at http://www.itl.nist.gov/iad/mig/tests/sre/2010/.
Nautsch A, Saeidi R, Rathgeb C, Busch C (2016) Robustness of quality-based score calibration of speaker recognition systems with respect to low-SNR and short-duration conditions. In: Odyssey, pp 358–365
Google Scholar
Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, Text-Independent Speaker Verification Systems. Proc Interspeech 2017:1522–1526
Google Scholar
Qi J, Wang D, Xu J, Tejedor Noguerales J (2013) Bottleneck features based on gammatone frequency cepstral coefficients. In Interspeech, International Speech Communication Association
Google Scholar
Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258
Google Scholar
Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. In: INTERSPEECH, pp 3717–3721
Google Scholar
Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41
Google Scholar
Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Sixteenth Annual Conference of the International Speech Communication Association. Interspeech (Vol. 2015)
Google Scholar
Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inform 28(1):102–110
Google Scholar
Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 1589–1592
Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156
Google Scholar
Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE circuits and systems magazine 11(2):23–61
Google Scholar
Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Proc Comput Sci 61:416–421
Google Scholar
Venkatesan R, Ganesh AB (2017) Binaural classification-based speech segregation and robust speaker recognition system. Circuits, Syst, Signal Process:1–29
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, “Hidden Markov model toolkit (htk) version 3.4 user’s guide”, 2002.
Google Scholar
Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, pp 7204–7208
Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616
Google Scholar
Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans Audio, Speech, Language Process 22(4):836–845
Google Scholar
Zue V, Seneff S, Glass J (1990) Speech database development at MIT: TIMIT and beyond. Speech Comm 9(4):351–356
Google Scholar

Download references

Author information

Authors and Affiliations

Advanced Technologies for Image and Signal Processing (ATISP) Research Unit, Sfax, Tunisia
Rania Chakroun & Mondher Frikha
National School of Engineering of sfax, University of Sfax, Sfax, Tunisia
Rania Chakroun
National School of Electronics and Telecommunications of Sfax, University of Sfax, Sfax, Tunisia
Mondher Frikha

Authors

Rania Chakroun
View author publications
You can also search for this author in PubMed Google Scholar
Mondher Frikha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rania Chakroun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakroun, R., Frikha, M. Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments. Multimed Tools Appl 79, 21279–21298 (2020). https://doi.org/10.1007/s11042-020-08824-7

Download citation

Received: 02 April 2019
Revised: 21 December 2019
Accepted: 06 March 2020
Published: 03 May 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11042-020-08824-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments

Abstract

Access this article

Similar content being viewed by others

Robust features for text-independent speaker recognition with short utterances

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

A New Text Independent Speaker Recognition System with Short Utterances Using SVM

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments

Abstract

Access this article

Similar content being viewed by others

Robust features for text-independent speaker recognition with short utterances

Prosodic Features Based Text-dependent Speaker Recognition with Short Utterance

A New Text Independent Speaker Recognition System with Short Utterances Using SVM

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation