Abstract
This work focuses on long-enrollment with short-test speaker verification (SV) from the perspective of application-oriented systems. The importance of phonetic match between train and test models is explored in terms of having a text-constraint model-based framework on Part IV of RedDots database. This database has a text-dependent and a text-prompted-based enrollment conditions for speaker modeling. Two different text-constraint setups are formalized for evaluating the effect of text match on train and test sessions. Further, the excitation source features mel power difference of spectrum in subbands, residual mel frequency cepstral coefficient and discrete cosine transform of integrated linear prediction residual are investigated to determine their significance for text-constraint-based framework. Although the source features individually perform poorer compared to the conventional mel frequency cepstral coefficient (MFCC) features, their significance is reflected in fusion due to the complementary nature of information carried by them. Additionally, the source features become imperative for text-constraint-based models for long-enrollment with short-test SV in fusion to MFCC features and achieves commendable improvement from baseline framework of text-prompted-based enrollment condition. This thus minimizes the performance difference between text-dependent and text-prompted-based enrollment condition showing importance of text-constraint models and source information in long-enrollment with short-test-based framework favorable from the perspective of field deployable systems.
Similar content being viewed by others
References
M.J. Alam, P. Kenny, V. Gupta, Tandem features for text-dependent speaker verification on the reddots corpus. Interspeech 2016, 420–424 (2016)
T.V. Ananthapadmanabha, A.P. Prathosh, A.G. Ramakrishnan, Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)
D. Chakrabarty, S.R.M. Prasanna, R.K. Das, Development and evaluation of online text-independent speaker verification system for remote person authentication. Int. J. Speech Technol. 16(1), 75–88 (2013)
W. Chan, N. Zheng, T. Lee, Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1884–1892 (2007)
R.K. Das, B, A, S.R.M. Prasanna, A.G. Ramakrishnan, Combining source and system information for limited data speaker verification, in Interspeech 2014 (Singapore, 2014), pp. 1836–1840
R.K. Das, S. Jelil, S.R.M. Prasanna, Exploring session variability and template aging in speaker verification for fixed phrase short utterances, in Interspeech 2016, pp. 445–449
R.K. Das, S. Jelil, S.R.M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) 2016, (IISc Bangalore, 2016)
R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)
R.K. Das, D. Pati, S.R.M. Prasanna, Different aspects of source information for limited data speaker verification, in National conference on communications (NCC) 2015 (IIT Bombay, 2015)
R.K. Das, S.R.M. Prasanna, Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering, chap. 16 (Springer, 2015), pp. 193–200
R.K. Das, S.R.M. Prasanna, Exploring different attributes of source information for speaker verification with limited test data. J. Acoust. Soc. Am. 140(1), 184–190 (2016)
R.K. Das, S.R.M. Prasanna, Speaker verification from short utterance perspective: a review. IETE Tech. Rev. (2017). https://doi.org/10.1080/02564602.2017.1357507
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B C Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system, in National Conference on Communications (NCC) 2014 (IIT Kanpur, 2014)
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems, in Proc. Interspeech (2011), pp. 249–252
J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proc. ICASSP (2008) , pp. 4821–4824
S. Hayakawa, K. Takeda, F. Itakura, Speaker identification using harmonic structure of lp-residual spectrum. Lecture Notesin Biometric Personal Aunthentification, vol. 1206 (Springer, Berlin , 1997), pp. 253–260
S. Jelil, R.K. Das, R. Sinha, S.R.M. Prasanna, Speaker verification using gaussian posteriorgrams on fixed phrase short utterances, in Interspeech 2015 (Dresden, Germany, 2015), pp. 1042–1046
A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, M. Mason, i-vector based speaker recognition on short utterances, In Interspeech 2011 (2011)
T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco, A.K. Sarkar, N.B. Thomsen, V. Hautamki, N. Evans, Z.H. Tan, Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. Interspeech 2016, 430–434 (2016)
A. Larcher, P. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, i-vectors in the context of phonetically-constrained short utterances for speaker verification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012 (2012), pp. 4773–4776
A. Larcher, K.A. Lee, B. Ma, H. Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 (2013), pp. 7673–7677
A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)
K.A. Lee, A. Larcher, W. Guangsen, K. Patrick, N. Brummer, D. van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez, The RedDots data collection for speaker recognition, in Interspeech 2015 (Dresden, Germany, 2015), pp. 2996–3000
K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home, in INTERSPEECH (2011), pp. 3317–3318
J. Ma, S. Irtza, K. Sriskandaraja, V. Sethu, E. Ambikairajah, Parallel speaker and content modelling for text-dependent speaker verification. Interspeech 2016, 435–439 (2016)
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, in Proc. Eurospeech (Rhodes, Greece, 1997), pp. 1895–1898
I. Mporas, S. Safavi, R. Sotudeh, Improving robustness of speaker verification by fusion of prompted text-dependent and text-independent operation modalities, in Speech and Computer, ed. by A. Ronzhin, R. Potapova, G. Németh (Springer International Publishing, Cham, 2016), pp. 378–385
K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)
D. Pati, S.R.M. Prasanna, Speaker information from subband energies of linear prediction residual, in National Conference on Communications (NCC) (2010), pp. 1–4
D. Pati, S.R.M. Prasanna, A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana 38(4), 591–620 (2013)
S.R.M. Prasanna, C. Gupta, B. Yegananarayana, Extraction of speaker specific information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index, in IEEE Trans. on Audio, Speech, and Language Processing vol. 21, no. 12 (2013), pp. 2471–2480
B. Putra, Suyanto Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). in 2011 2nd international conference on instrumentation control and automation (ICA) (2011), pp 358–363
A.G. Ramakrishnan, B. Abhiram, S.R.M. Prasanna, Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. JASA Express Lett. 137, EL469–EL475 (2015)
R. Ramos-Lara, M. Lpez-Garca, E. Cant-Navarro, L. Puente-Rodriguez, Real-time speaker verification system implemented on reconfigurable hardware. J. Signal Process. Syst. 71(2), 89–103 (2013)
S. Safavi, H. Gan, I. Mporas, Improving speaker verification performance under spoofing attacks by fusion of different operational modes, in 13th IEEE International Colloquium on Signal Processing its Applications (CSPA), vol. 2017 (2017), pp. 219–223
S. Safavi, I. Mporas, Combination of rule-based and data-driven fusion methodologies for different speaker verification modes of operation. in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) 2017 (2017), pp. 354–359
S. Safavi, I. Mporas, Improving performance of speaker identification systems using score level fusion of two modes of operation, in Speech and Computer, ed. by A. Karpov, R. Potapova, I. Mporas (Springer International Publishing, Cham, 2017), pp. 438–444
A.K. Sarkar, Z.H. Tan, Text dependent speaker verification using un-supervised hmm-ubm and temporal gmm-ubm. Interspeech 2016, 425–429 (2016)
P. Thvenaz, H. Hgli, Usefulness of the lpc-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)
G. Wang, K.A. Lee, T.H. Nguyen, H. Sun, B. Ma, Joint speaker and lexical modeling for short-term characterization of speaker, in Interspeech 2016 (2016), pp. 415–419
H. Zeinali, H. Sameti, L. Burget, J. ernock, N. Maghsoodi, P. Matjka, i-vector/hmm based text-dependent speaker verification system for reddots challenge, in Interspeech 2016 (2016), pp. 440–444
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Das, R.K., Jelil, S. & Prasanna, S.R.M. Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification. Circuits Syst Signal Process 38, 1775–1792 (2019). https://doi.org/10.1007/s00034-018-0937-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-018-0937-y