Skip to main content
Log in

Robust Methods for Text-Dependent Speaker Verification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In this work, we explore various noise robust techniques at different stages of a Text-Dependent Speaker Verification (TDSV) system. A speech-specific knowledge-based robust end points detection technique is used for noise compensation at signal level. Feature-level compensation is done by using robust features extracted from Hilbert Spectrum (HS) of the Intrinsic Mode Functions obtained from Modified Empirical Mode Decomposition of speech. We also explored a combined temporal and spectral speech enhancement technique prior to the end points detection for enhancing speech regions embedded in noise. All experimental studies are conducted using two databases, namely the RSR2015 and the IITG database. It is found that the use of robust end points detection improves the performance of the TDSV system compared to the energy-based end points detection in both clean and degraded speech conditions. Use of noise robust HS features augmented with Mel-frequency cepstral coefficients further improves the performance of the system. It is also found that the use of speech enhancement prior to signal and feature-level compensation results in further improvement in performance for the low SNR cases. The final combined system obtained by using three robust methods provides a relative improvement from 6 to 25% in terms of the EER, on the RSR2015 database corrupted with Babble noise of varying strength and by around from 30 to 45% relative improvement on the IITG database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. L.D. Alsteris, K.K. Paliwal, Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun. 48(6), 727–736 (2006)

    Article  Google Scholar 

  2. Y. Bayya, D.N. Gowda, Spectro-temporal analysis of speech signals using zero-time windowing and group delay function. Speech Commun. 55(6), 782–795 (2013)

    Article  Google Scholar 

  3. H. Beigi, Speaker Recognition: Advancements and Challenges (INTECH Open Access Publisher, London, 2012)

    Google Scholar 

  4. R.K. Bhukya, B.D. Sarma, S.R.M. Prasanna, End point detection using speech-specific knowledge for text-dependent speaker verification. Circuits Syst. Signal Process. 37(12), 5507–5539 (2018)

    Article  Google Scholar 

  5. G. Biagetti, P. Crippa, L. Falaschetti, S. Orcioni, C. Turchetti, An investigation on the accuracy of truncated DKLT representation for speaker identification with short sequences of speech frames. IEEE Trans. Cybern. 47(12), 4235–4249 (2017)

    Article  Google Scholar 

  6. G. Biagetti, P. Crippa, L. Falaschetti, S. Orcioni, C. Turchetti, Speaker identification in noisy conditions using short sequences of speech frames. In: International Conference on Intelligent Decision Technologies (Springer, 2017), pp. 43–52

  7. H. Boril, P. Fousek, P. Pollák, Data-driven design of front-end filter bank for Lombard speech recognition. In: Ninth International Conference on Spoken Language Processing (2006)

  8. A. Bouchikhi, A.O. Boudraa, Multicomponent AM-FM signals analysis based on EMD-B-splines ESA. Signal Process. 92(9), 2214–2228 (2012)

    Article  Google Scholar 

  9. C. Charbuillet, B. Gas, M. Chetouani, J. Zarader, Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)

    Article  Google Scholar 

  10. K.T. Deepak, S.R.M. Prasanna, Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1205–1219 (2016)

    Article  Google Scholar 

  11. K.T. Deepak, B.D. Sarma, S.R.M. Prasanna, Foreground speech segmentation using zero frequency filtered signal. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)

  12. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  13. S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B.C. Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system. In: National Conference on Communications (2014)

  14. N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. Signal Process. Lett. IEEE 17(3), 273–276 (2010)

    Article  Google Scholar 

  15. G.R. Doddington, M.A. Przybocki, A.F. Martin, D.A. Reynolds, The NIST speaker recognition evaluation—overview, methodology, systems, results, perspective. Speech Commun. 31(2), 225–254 (2000)

    Article  Google Scholar 

  16. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2), 443–445 (1985)

    Article  Google Scholar 

  17. P. Flandrin, Some aspects of huangs empirical mode decomposition, from interpretation to applications. In: International Conference of Computational Harmonic Analysis CHA, vol. 4 (2004)

  18. P. Flandrin, P. Gonçalves, G. Rilling, EMD equivalent filter banks, from interpretation to applications, in Hilbert-Huang Transform and Its Applications. Interdisciplinary Mathematical Sciences, ed. by N.E. Huang, S.S.P. Shen (World Scientific Publishing, Singapore, 2005), pp. 57–74

    Chapter  Google Scholar 

  19. S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)

    Article  Google Scholar 

  20. T. Ganchev, N. Fakotakis, G. Kokkinakis, Comparative evaluation of various MFCC implementations on the speaker verification task. Proc. SPECOM 1, 191–194 (2005)

    Google Scholar 

  21. S. Gazor, W. Zhang, A soft voice activity detector based on a Laplacian–Gaussian model. IEEE Trans. Speech Audio Proces. 11(5), 498–505 (2003)

    Article  Google Scholar 

  22. F. Gianfelici, G. Biagetti, P. Crippa, C. Turchetti, Multicomponent AM-FM representations: an asymptotically exact approach. IEEE Trans. Audio Speech Lang. Process. 15(3), 823–837 (2007)

    Article  Google Scholar 

  23. M. Hébert, Text-dependent speaker recognition, in Springer Handbook of Speech Processing, ed. by J. Benesty, M.M. Sondhi, Y.A. Huang (Springer, 2008), pp. 743–762

  24. R.S. Holambe, M.S. Deshpande, Advances in Non-linear Modeling for Speech Processing (Springer Science & Business Media, Berlin, 2012)

    Book  Google Scholar 

  25. N.E. Huang, Empirical mode decomposition and Hilbert spectral analysis (1998), https://ntrs.nasa.gov/search.jsp?R=19990078602

  26. N.E. Huang, S.S. Shen, Hilbert–Huang transform and Its Applications, vol. 5 (World Scientific, Singapore, 2005)

    Book  Google Scholar 

  27. J.C. Junqua, B. Reaves, B. Mak, A study of endpoint detection algorithms in adverse conditions: incidence on a DTW and HMM recognizer. In: Second European Conference on Speech Communication and Technology (1991)

  28. K. Khaldi, A.O. Boudraa, A. Komaty, Speech enhancement using empirical mode decomposition and the Teager–Kaiser energy operator. J. Acoust. Soc. Am. 135(1), 451–459 (2014)

    Article  Google Scholar 

  29. A.N. Khan, B. Yegnanarayana, Vowel onset point based variable frame rate analysis for speech recognition. In: Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005 (IEEE, 2005), pp. 392–394

  30. B.K. Khonglah, R.K. Bhukya, S.R.M. Prasanna, Processing degraded speech for text dependent speaker verification. Int. J. Speech Technol. 20(4), 839–850 (2017)

    Article  Google Scholar 

  31. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  32. H. Kremer, A. Cohen, T. Vaich, Voice activity detector (VAD) for hmm based speech recognition. In: Proceedings of ICSPAT (1999)

  33. P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Commun. 53(2), 154–174 (2011)

    Article  Google Scholar 

  34. A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  35. K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home. In: INTERSPEECH (2011), pp. 3317–3318

  36. Q. Li, J. Zheng, A. Tsai, Q. Zhou, Robust endpoint detection and energy normalization for real-time speech and speaker recognition. IEEE Trans. Speech Audio Process. 10(3), 146–157 (2002)

    Article  Google Scholar 

  37. D. Mahanta, A. Paul, R.K. Bhukya, R.K. Das, R. Sinha, S.R.M. Prasanna, Warping path and gross spectrum information for speaker verification under degraded condition. In: 22nd National Conference on Communication (NCC) (IEEE, 2016), pp. 1–6

  38. J. Makhoul, Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  39. S. Marinov, H.I. Skövde, Text dependent and text independent speaker verification systems. Technology and applications. Overview article (2003)

  40. A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance. Technical report, National Institute of Standards and Technology, Gaithersburg MD (1997)

  41. N. McLaughlin, J. Ming, D. Crookes, Speaker recognition in noisy conditions with limited training data. In: 2011 19th European Signal Processing Conference (IEEE, 2011), pp. 1294–1298

  42. J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  43. K.S.R. Murty, B. Yegnanarayana, M.A. Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)

    Article  Google Scholar 

  44. A. Paul, D. Mahanta, R.K. Das, R.K. Bhukya, S. Prasanna, Presence of speech region detection using vowel-like regions and spectral slope information. In: 2017 14th IEEE India Council International Conference (INDICON) (IEEE, 2017), p. 15

  45. G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)

    Article  Google Scholar 

  46. S.R.M. Prasanna, G. Pradhan, Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans. Audio Speech Lang. Process. 19(8), 2552–2565 (2011)

    Article  Google Scholar 

  47. S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation information. In: Ninth European Conference on Speech Communication and Technology (2005)

  48. S.R.M. Prasanna, J.M. Zachariah, B. Yegnanarayana, Begin-end detection using vowel onset points. In: Workshop on Spoken Language Processing (2003)

  49. L.R. Rabiner, R.W. Schafer et al., Introduction to digital speech processing. Found. Trends® Signal Process. 1(1–2), 1–194 (2007)

    Article  Google Scholar 

  50. K. Ramesh, S.R.M. Prasanna, R.K. Das, Significance of glottal activity detection and glottal signature for text dependent speaker verification. In: 2014 IEEE International Conference on Signal Processing and Communications (SPCOM) (2014), pp. 1–5

  51. B.D. Sarma, S.R.M. Prasanna, P. Sarmah, Consonant-vowel unit recognition using dominant aperiodic and transition region detection. Speech Commun. 92, 77–89 (2017)

    Article  Google Scholar 

  52. R. Sharma, R.K. Bhukya, S.R.M. Prasanna, Analysis of the Hilbert spectrum for text-dependent speaker verification. Speech Commun. 96, 207–224 (2018)

    Article  Google Scholar 

  53. R. Sharma, S.R.M. Prasanna, A better decomposition of speech obtained using modified empirical mode decomposition. Digit. Signal Process. 58, 26–39 (2016)

    Article  Google Scholar 

  54. R. Sharma, S.R.M. Prasanna, R.K. Bhukya, R.K. Das, Analysis of the intrinsic mode functions for speaker information. Speech Commun. 91, 1–16 (2017)

    Article  Google Scholar 

  55. R. Sharma, L. Vignolo, G. Schlotthauer, M.A. Colominas, H.L. Rufiner, S.R.M. Prasanna, Empirical mode decomposition for adaptive AM-FM analysis of speech: a review. Speech Commun. 88, 39–64 (2017)

    Article  Google Scholar 

  56. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition: II. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  57. J.D. Wu, Y.J. Tsai, Speaker identification system using empirical mode decomposition and an artificial neural network. Expert Syst. Appl. 38(5), 6112–6117 (2011)

    Article  Google Scholar 

  58. B. Yegnanarayana, S.R.M. Prasanna, J.M. Zachariah, C.S. Gupta, Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans. Speech Audio Process. 13(4), 575–582 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ramesh K. Bhukya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhukya, R.K., Prasanna, S.R.M. & Sarma, B.D. Robust Methods for Text-Dependent Speaker Verification. Circuits Syst Signal Process 38, 5253–5288 (2019). https://doi.org/10.1007/s00034-019-01125-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01125-x

Keywords

Navigation