Abstract
Automatic speaker recognition has emerged as an important technology for voice-based biometric systems. However, text-independent speaker recognition against short utterances remains a challenging task despite of recent advances in the domain of speaker recognition. The presence of background noise presents another critical issue in this field. In this paper, we propose effective features for speaker identification with short utterances, which perform well in both clean and noisy conditions. Speaker identification performance for utterances having very short training and testing durations are presented which provide a clearer description of the proposed system performance. Te proposed features have shown strong robustness in these challenging situations and they consistently perform better than the well known MFCC and GFCC features. The efficiency of the proposed approach was thoroughly tested by comparisons with the most recently successful SVM and i-vector PLDA baseline speaker recognition systems.
Similar content being viewed by others
References
Campbell WM (2002) Generalized linear discriminant sequence kernels for speaker recognition. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing, pp 161–164
Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machine using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311
Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880
Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5415–5419
Chow D, Abdulla W (2004) Robust speaker identification based on perceptual log area ratio and Gaussian mixture models. In: Eighth International Conference on Spoken Language Processing
Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (May 2010) Front-end factor analysis for speaker verification. IEEE Trans Audio, Speech, and Lang Process 19(99):788–798
Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332
Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301–2314
Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: 2012 international conference on systems and informatics (ICSAI2012). IEEE, pp 1746–1750
Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia tools and applications:1–19
Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, informatics and mathematical modeling, Technical University of Denmark, DTU, 2005
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech Corpus CDROM. NIST
Greenberg CS (2012) The NIST year 2012 speaker recognition evaluation plan. NIST, Technical report. [Online]. Available: http://www.nist.gov
Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99
Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth Annual Conference of the International Speech Communication Association
Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS One 11(7):e0158520
Jayanna HS, Prasanna SM (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204
Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association. International Speech Communication Association (ISCA), pp 2341–2344
Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The Speaker and Language Recognition Workshop (Odyssey 2012). ISCA
Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1665–1669
Kanagasundaram A, Dean D, Sridharan S, Ghaemmaghami H, Fookes C (2017) A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20(2):247–259
Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122
Kinnunen T, Karpov E, Franti P (2005) Real-time speaker identification and verification. IEEE Trans Audio Speech Lang Process 14(1):277–288
Kozlov A, Kudashev O, Matveev Y, Pekhovsky T, Simonchik K, Shulipa A (2013) SVID speaker recognition system for NIST SRE 2012. In: International conference on speech and computer. Springer, Cham, pp 278–285
Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490
Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406
Li Q, Huang Y (2010) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953
Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In acoustics, speech and signal processing (ICASSP), 2017 IEEE international conference on (pp. 5230-5234). IEEE.
Liu JC, Leu FY, Lin GL, Susanto H (2018a) An MFCC-based text-independent speaker identification system for access control. Concurr Comput: Pract Experience 30(2)
Liu Z, Wu Z, Li T, Li J, Shen C (2018b) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind inf 14(7):3244–3252
Manikandan J, Venkataramani B (2011) Evaluation of multiclass support vector machine classifiers using optimum threshold-based pruning technique. IET Signal Process 5(5):506–513
Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Processing 8(8):860–867
McLaren M, Vogt R, Baker B, Sridharan S, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Odyssey, p 17
Motlicek P, Dey S, Madikeri S, Burget L (2015) ‘Employment of Subspace Gaussian Mixture Models in speaker recognition’, Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4445–4449). IEEE
National Institute Of Standards and Technology, NIST (2010) Speaker Recognition Evaluation Plan ,Available at http://www.itl.nist.gov/iad/mig/tests/sre/2010/.
Nautsch A, Saeidi R, Rathgeb C, Busch C (2016) Robustness of quality-based score calibration of speaker recognition systems with respect to low-SNR and short-duration conditions. In: Odyssey, pp 358–365
Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, Text-Independent Speaker Verification Systems. Proc Interspeech 2017:1522–1526
Qi J, Wang D, Xu J, Tejedor Noguerales J (2013) Bottleneck features based on gammatone frequency cepstral coefficients. In Interspeech, International Speech Communication Association
Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258
Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. In: INTERSPEECH, pp 3717–3721
Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41
Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Sixteenth Annual Conference of the International Speech Communication Association. Interspeech (Vol. 2015)
Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inform 28(1):102–110
Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 1589–1592
Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156
Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE circuits and systems magazine 11(2):23–61
Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Proc Comput Sci 61:416–421
Venkatesan R, Ganesh AB (2017) Binaural classification-based speech segregation and robust speaker recognition system. Circuits, Syst, Signal Process:1–29
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, “Hidden Markov model toolkit (htk) version 3.4 user’s guide”, 2002.
Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access
Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, pp 7204–7208
Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616
Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans Audio, Speech, Language Process 22(4):836–845
Zue V, Seneff S, Glass J (1990) Speech database development at MIT: TIMIT and beyond. Speech Comm 9(4):351–356
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chakroun, R., Frikha, M. Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments. Multimed Tools Appl 79, 21279–21298 (2020). https://doi.org/10.1007/s11042-020-08824-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08824-7