Skip to main content
Log in

Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic speaker recognition has emerged as an important technology for voice-based biometric systems. However, text-independent speaker recognition against short utterances remains a challenging task despite of recent advances in the domain of speaker recognition. The presence of background noise presents another critical issue in this field. In this paper, we propose effective features for speaker identification with short utterances, which perform well in both clean and noisy conditions. Speaker identification performance for utterances having very short training and testing durations are presented which provide a clearer description of the proposed system performance. Te proposed features have shown strong robustness in these challenging situations and they consistently perform better than the well known MFCC and GFCC features. The efficiency of the proposed approach was thoroughly tested by comparisons with the most recently successful SVM and i-vector PLDA baseline speaker recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig.7
Fig.8
Fig.9
Fig. 10

Similar content being viewed by others

References

  1. Campbell WM (2002) Generalized linear discriminant sequence kernels for speaker recognition. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing, pp 161–164

    Google Scholar 

  2. Campbell WM, Sturim DE, Reynolds DA (2006) Support vector machine using GMM supervectors for speaker verification. IEEE Signal Process Lett 13(5):308–311

    Google Scholar 

  3. Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880

    Google Scholar 

  4. Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5415–5419

  5. Chow D, Abdulla W (2004) Robust speaker identification based on perceptual log area ratio and Gaussian mixture models. In: Eighth International Conference on Spoken Language Processing

    Google Scholar 

  6. Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (May 2010) Front-end factor analysis for speaker verification. IEEE Trans Audio, Speech, and Lang Process 19(99):788–798

    Google Scholar 

  7. Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332

    Google Scholar 

  8. Dua M, Aggarwal RK, Biswas M (2019) GFCC based discriminatively trained noise robust continuous ASR system for Hindi language. J Ambient Intell Humaniz Comput 10(6):2301–2314

    Google Scholar 

  9. Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: 2012 international conference on systems and informatics (ICSAI2012). IEEE, pp 1746–1750

  10. Fedila M, Bengherabi M, Amrouche A (2017) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia tools and applications:1–19

  11. Feng L, Hansen LK (2005) A new database for speaker recognition. IMM, informatics and mathematical modeling, Technical University of Denmark, DTU, 2005

  12. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech Corpus CDROM. NIST

  13. Greenberg CS (2012) The NIST year 2012 speaker recognition evaluation plan. NIST, Technical report. [Online]. Available: http://www.nist.gov

  14. Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74–99

    Google Scholar 

  15. Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth Annual Conference of the International Speech Communication Association

    Google Scholar 

  16. Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS One 11(7):e0158520

    Google Scholar 

  17. Jayanna HS, Prasanna SM (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204

    Google Scholar 

  18. Kanagasundaram A, Vogt R, Dean DB, Sridharan S, Mason MW (2011) I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association. International Speech Communication Association (ISCA), pp 2341–2344

  19. Kanagasundaram A, Vogt RJ, Dean DB, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The Speaker and Language Recognition Workshop (Odyssey 2012). ISCA

  20. Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1665–1669

  21. Kanagasundaram A, Dean D, Sridharan S, Ghaemmaghami H, Fookes C (2017) A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. Int J Speech Technol 20(2):247–259

    Google Scholar 

  22. Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122

    Google Scholar 

  23. Kinnunen T, Karpov E, Franti P (2005) Real-time speaker identification and verification. IEEE Trans Audio Speech Lang Process 14(1):277–288

    MATH  Google Scholar 

  24. Kozlov A, Kudashev O, Matveev Y, Pekhovsky T, Simonchik K, Shulipa A (2013) SVID speaker recognition system for NIST SRE 2012. In: International conference on speech and computer. Springer, Cham, pp 278–285

    Google Scholar 

  25. Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490

    Google Scholar 

  26. Li Z, Gao Y (2016) Acoustic feature extraction method for robust speaker identification. Multimed Tools Appl 75(12):7391–7406

    Google Scholar 

  27. Li Q, Huang Y (2010) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801

    Google Scholar 

  28. Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953

    Google Scholar 

  29. Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In acoustics, speech and signal processing (ICASSP), 2017 IEEE international conference on (pp. 5230-5234). IEEE.

  30. Liu JC, Leu FY, Lin GL, Susanto H (2018a) An MFCC-based text-independent speaker identification system for access control. Concurr Comput: Pract Experience 30(2)

  31. Liu Z, Wu Z, Li T, Li J, Shen C (2018b) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind inf 14(7):3244–3252

    Google Scholar 

  32. Manikandan J, Venkataramani B (2011) Evaluation of multiclass support vector machine classifiers using optimum threshold-based pruning technique. IET Signal Process 5(5):506–513

    Google Scholar 

  33. Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Processing 8(8):860–867

    Google Scholar 

  34. McLaren M, Vogt R, Baker B, Sridharan S, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Odyssey, p 17

    Google Scholar 

  35. Motlicek P, Dey S, Madikeri S, Burget L (2015) ‘Employment of Subspace Gaussian Mixture Models in speaker recognition’, Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4445–4449). IEEE

  36. National Institute Of Standards and Technology, NIST (2010) Speaker Recognition Evaluation Plan ,Available at http://www.itl.nist.gov/iad/mig/tests/sre/2010/.

  37. Nautsch A, Saeidi R, Rathgeb C, Busch C (2016) Robustness of quality-based score calibration of speaker recognition systems with respect to low-SNR and short-duration conditions. In: Odyssey, pp 358–365

    Google Scholar 

  38. Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, Text-Independent Speaker Verification Systems. Proc Interspeech 2017:1522–1526

    Google Scholar 

  39. Qi J, Wang D, Xu J, Tejedor Noguerales J (2013) Bottleneck features based on gammatone frequency cepstral coefficients. In Interspeech, International Speech Communication Association

    Google Scholar 

  40. Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258

    Google Scholar 

  41. Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. In: INTERSPEECH, pp 3717–3721

    Google Scholar 

  42. Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digital Signal Process 10(1–3):19–41

    Google Scholar 

  43. Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Sixteenth Annual Conference of the International Speech Communication Association. Interspeech (Vol. 2015)

    Google Scholar 

  44. Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inform 28(1):102–110

    Google Scholar 

  45. Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, pp 1589–1592

  46. Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156

    Google Scholar 

  47. Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE circuits and systems magazine 11(2):23–61

    Google Scholar 

  48. Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Proc Comput Sci 61:416–421

    Google Scholar 

  49. Venkatesan R, Ganesh AB (2017) Binaural classification-based speech segregation and robust speaker recognition system. Circuits, Syst, Signal Process:1–29

  50. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, “Hidden Markov model toolkit (htk) version 3.4 user’s guide”, 2002.

    Google Scholar 

  51. Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access

  52. Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on. IEEE, pp 7204–7208

  53. Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(5):1608–1616

    Google Scholar 

  54. Zhao X, Wang Y, Wang D (2014) Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans Audio, Speech, Language Process 22(4):836–845

    Google Scholar 

  55. Zue V, Seneff S, Glass J (1990) Speech database development at MIT: TIMIT and beyond. Speech Comm 9(4):351–356

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rania Chakroun.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakroun, R., Frikha, M. Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments. Multimed Tools Appl 79, 21279–21298 (2020). https://doi.org/10.1007/s11042-020-08824-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08824-7

Keywords

Navigation