Skip to main content
Log in

Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This work focuses on long-enrollment with short-test speaker verification (SV) from the perspective of application-oriented systems. The importance of phonetic match between train and test models is explored in terms of having a text-constraint model-based framework on Part IV of RedDots database. This database has a text-dependent and a text-prompted-based enrollment conditions for speaker modeling. Two different text-constraint setups are formalized for evaluating the effect of text match on train and test sessions. Further, the excitation source features mel power difference of spectrum in subbands, residual mel frequency cepstral coefficient and discrete cosine transform of integrated linear prediction residual are investigated to determine their significance for text-constraint-based framework. Although the source features individually perform poorer compared to the conventional mel frequency cepstral coefficient (MFCC) features, their significance is reflected in fusion due to the complementary nature of information carried by them. Additionally, the source features become imperative for text-constraint-based models for long-enrollment with short-test SV in fusion to MFCC features and achieves commendable improvement from baseline framework of text-prompted-based enrollment condition. This thus minimizes the performance difference between text-dependent and text-prompted-based enrollment condition showing importance of text-constraint models and source information in long-enrollment with short-test-based framework favorable from the perspective of field deployable systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://sites.google.com/site/thereddotsproject/.

References

  1. M.J. Alam, P. Kenny, V. Gupta, Tandem features for text-dependent speaker verification on the reddots corpus. Interspeech 2016, 420–424 (2016)

    Article  Google Scholar 

  2. T.V. Ananthapadmanabha, A.P. Prathosh, A.G. Ramakrishnan, Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)

    Article  Google Scholar 

  3. D. Chakrabarty, S.R.M. Prasanna, R.K. Das, Development and evaluation of online text-independent speaker verification system for remote person authentication. Int. J. Speech Technol. 16(1), 75–88 (2013)

    Article  Google Scholar 

  4. W. Chan, N. Zheng, T. Lee, Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1884–1892 (2007)

    Article  Google Scholar 

  5. R.K. Das, B, A, S.R.M. Prasanna, A.G. Ramakrishnan, Combining source and system information for limited data speaker verification, in Interspeech 2014 (Singapore, 2014), pp. 1836–1840

  6. R.K. Das, S. Jelil, S.R.M. Prasanna, Exploring session variability and template aging in speaker verification for fixed phrase short utterances, in Interspeech 2016, pp. 445–449

  7. R.K. Das, S. Jelil, S.R.M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) 2016, (IISc Bangalore, 2016)

  8. R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)

    Article  Google Scholar 

  9. R.K. Das, D. Pati, S.R.M. Prasanna, Different aspects of source information for limited data speaker verification, in National conference on communications (NCC) 2015 (IIT Bombay, 2015)

  10. R.K. Das, S.R.M. Prasanna, Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering, chap. 16 (Springer, 2015), pp. 193–200

  11. R.K. Das, S.R.M. Prasanna, Exploring different attributes of source information for speaker verification with limited test data. J. Acoust. Soc. Am. 140(1), 184–190 (2016)

    Article  Google Scholar 

  12. R.K. Das, S.R.M. Prasanna, Speaker verification from short utterance perspective: a review. IETE Tech. Rev. (2017). https://doi.org/10.1080/02564602.2017.1357507

    Google Scholar 

  13. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  14. S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B C Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system, in National Conference on Communications (NCC) 2014 (IIT Kanpur, 2014)

  15. S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)

    Article  Google Scholar 

  16. S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)

    Article  Google Scholar 

  17. D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems, in Proc. Interspeech (2011), pp. 249–252

  18. J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proc. ICASSP (2008) , pp. 4821–4824

  19. S. Hayakawa, K. Takeda, F. Itakura, Speaker identification using harmonic structure of lp-residual spectrum. Lecture Notesin Biometric Personal Aunthentification, vol. 1206 (Springer, Berlin , 1997), pp. 253–260

  20. S. Jelil, R.K. Das, R. Sinha, S.R.M. Prasanna, Speaker verification using gaussian posteriorgrams on fixed phrase short utterances, in Interspeech 2015 (Dresden, Germany, 2015), pp. 1042–1046

  21. A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, M. Mason, i-vector based speaker recognition on short utterances, In Interspeech 2011 (2011)

  22. T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco, A.K. Sarkar, N.B. Thomsen, V. Hautamki, N. Evans, Z.H. Tan, Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. Interspeech 2016, 430–434 (2016)

    Article  Google Scholar 

  23. A. Larcher, P. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, i-vectors in the context of phonetically-constrained short utterances for speaker verification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012 (2012), pp. 4773–4776

  24. A. Larcher, K.A. Lee, B. Ma, H. Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 (2013), pp. 7673–7677

  25. A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)

    Article  Google Scholar 

  26. K.A. Lee, A. Larcher, W. Guangsen, K. Patrick, N. Brummer, D. van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez, The RedDots data collection for speaker recognition, in Interspeech 2015 (Dresden, Germany, 2015), pp. 2996–3000

  27. K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home, in INTERSPEECH (2011), pp. 3317–3318

  28. J. Ma, S. Irtza, K. Sriskandaraja, V. Sethu, E. Ambikairajah, Parallel speaker and content modelling for text-dependent speaker verification. Interspeech 2016, 435–439 (2016)

    Article  Google Scholar 

  29. A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, in Proc. Eurospeech (Rhodes, Greece, 1997), pp. 1895–1898

  30. I. Mporas, S. Safavi, R. Sotudeh, Improving robustness of speaker verification by fusion of prompted text-dependent and text-independent operation modalities, in Speech and Computer, ed. by A. Ronzhin, R. Potapova, G. Németh (Springer International Publishing, Cham, 2016), pp. 378–385

    Chapter  Google Scholar 

  31. K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)

    Article  Google Scholar 

  32. D. Pati, S.R.M. Prasanna, Speaker information from subband energies of linear prediction residual, in National Conference on Communications (NCC) (2010), pp. 1–4

  33. D. Pati, S.R.M. Prasanna, A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana 38(4), 591–620 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. S.R.M. Prasanna, C. Gupta, B. Yegananarayana, Extraction of speaker specific information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)

    Article  Google Scholar 

  35. A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index, in IEEE Trans. on Audio, Speech, and Language Processing vol. 21, no. 12 (2013), pp. 2471–2480

  36. B. Putra, Suyanto Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). in 2011 2nd international conference on instrumentation control and automation (ICA) (2011), pp 358–363

  37. A.G. Ramakrishnan, B. Abhiram, S.R.M. Prasanna, Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. JASA Express Lett. 137, EL469–EL475 (2015)

    Google Scholar 

  38. R. Ramos-Lara, M. Lpez-Garca, E. Cant-Navarro, L. Puente-Rodriguez, Real-time speaker verification system implemented on reconfigurable hardware. J. Signal Process. Syst. 71(2), 89–103 (2013)

    Article  Google Scholar 

  39. S. Safavi, H. Gan, I. Mporas, Improving speaker verification performance under spoofing attacks by fusion of different operational modes, in 13th IEEE International Colloquium on Signal Processing its Applications (CSPA), vol. 2017 (2017), pp. 219–223

  40. S. Safavi, I. Mporas, Combination of rule-based and data-driven fusion methodologies for different speaker verification modes of operation. in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) 2017 (2017), pp. 354–359

  41. S. Safavi, I. Mporas, Improving performance of speaker identification systems using score level fusion of two modes of operation, in Speech and Computer, ed. by A. Karpov, R. Potapova, I. Mporas (Springer International Publishing, Cham, 2017), pp. 438–444

    Chapter  Google Scholar 

  42. A.K. Sarkar, Z.H. Tan, Text dependent speaker verification using un-supervised hmm-ubm and temporal gmm-ubm. Interspeech 2016, 425–429 (2016)

    Article  Google Scholar 

  43. P. Thvenaz, H. Hgli, Usefulness of the lpc-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)

    Article  Google Scholar 

  44. G. Wang, K.A. Lee, T.H. Nguyen, H. Sun, B. Ma, Joint speaker and lexical modeling for short-term characterization of speaker, in Interspeech 2016 (2016), pp. 415–419

  45. H. Zeinali, H. Sameti, L. Burget, J. ernock, N. Maghsoodi, P. Matjka, i-vector/hmm based text-dependent speaker verification system for reddots challenge, in Interspeech 2016 (2016), pp. 440–444

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohan Kumar Das.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, R.K., Jelil, S. & Prasanna, S.R.M. Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification. Circuits Syst Signal Process 38, 1775–1792 (2019). https://doi.org/10.1007/s00034-018-0937-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-018-0937-y

Keywords

Navigation