Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

Das, Rohan Kumar; Jelil, Sarfaraz; Prasanna, S. R. Mahadeva

doi:10.1007/s00034-018-0937-y

Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

Published: 11 September 2018

Volume 38, pages 1775–1792, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Rohan Kumar Das ORCID: orcid.org/0000-0002-1332-3357¹,
Sarfaraz Jelil² &
S. R. Mahadeva Prasanna²

184 Accesses
1 Citation
Explore all metrics

Abstract

This work focuses on long-enrollment with short-test speaker verification (SV) from the perspective of application-oriented systems. The importance of phonetic match between train and test models is explored in terms of having a text-constraint model-based framework on Part IV of RedDots database. This database has a text-dependent and a text-prompted-based enrollment conditions for speaker modeling. Two different text-constraint setups are formalized for evaluating the effect of text match on train and test sessions. Further, the excitation source features mel power difference of spectrum in subbands, residual mel frequency cepstral coefficient and discrete cosine transform of integrated linear prediction residual are investigated to determine their significance for text-constraint-based framework. Although the source features individually perform poorer compared to the conventional mel frequency cepstral coefficient (MFCC) features, their significance is reflected in fusion due to the complementary nature of information carried by them. Additionally, the source features become imperative for text-constraint-based models for long-enrollment with short-test SV in fusion to MFCC features and achieves commendable improvement from baseline framework of text-prompted-based enrollment condition. This thus minimizes the performance difference between text-dependent and text-prompted-based enrollment condition showing importance of text-constraint models and source information in long-enrollment with short-test-based framework favorable from the perspective of field deployable systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Test takers’ attitudes of using exam-oriented mobile application as a tool to adapt in a high-stakes speaking test

Article 04 November 2023

Notes

https://sites.google.com/site/thereddotsproject/.

References

M.J. Alam, P. Kenny, V. Gupta, Tandem features for text-dependent speaker verification on the reddots corpus. Interspeech 2016, 420–424 (2016)
Article Google Scholar
T.V. Ananthapadmanabha, A.P. Prathosh, A.G. Ramakrishnan, Detection of closure-burst transitions of stops and affricates in continuous speech using plosion index. J. Acoust. Soc. Am. 135(1), 460–471 (2014)
Article Google Scholar
D. Chakrabarty, S.R.M. Prasanna, R.K. Das, Development and evaluation of online text-independent speaker verification system for remote person authentication. Int. J. Speech Technol. 16(1), 75–88 (2013)
Article Google Scholar
W. Chan, N. Zheng, T. Lee, Discrimination power of vocal source and vocal tract related features for speaker segmentation. IEEE Trans. Audio Speech Lang. Process. 15(6), 1884–1892 (2007)
Article Google Scholar
R.K. Das, B, A, S.R.M. Prasanna, A.G. Ramakrishnan, Combining source and system information for limited data speaker verification, in Interspeech 2014 (Singapore, 2014), pp. 1836–1840
R.K. Das, S. Jelil, S.R.M. Prasanna, Exploring session variability and template aging in speaker verification for fixed phrase short utterances, in Interspeech 2016, pp. 445–449
R.K. Das, S. Jelil, S.R.M. Prasanna, Significance of constraining text in limited data text-independent speaker verification, in International Conference on Signal Processing and Communications (SPCOM) 2016, (IISc Bangalore, 2016)
R.K. Das, S. Jelil, S.R.M. Prasanna, Development of multi-level speech based person authentication system. J. Signal Process. Syst. 88(3), 259–271 (2017)
Article Google Scholar
R.K. Das, D. Pati, S.R.M. Prasanna, Different aspects of source information for limited data speaker verification, in National conference on communications (NCC) 2015 (IIT Bombay, 2015)
R.K. Das, S.R.M. Prasanna, Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering, chap. 16 (Springer, 2015), pp. 193–200
R.K. Das, S.R.M. Prasanna, Exploring different attributes of source information for speaker verification with limited test data. J. Acoust. Soc. Am. 140(1), 184–190 (2016)
Article Google Scholar
R.K. Das, S.R.M. Prasanna, Speaker verification from short utterance perspective: a review. IETE Tech. Rev. (2017). https://doi.org/10.1080/02564602.2017.1357507
Google Scholar
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
S. Dey, S. Barman, R.K. Bhukya, R.K. Das, B C Haris, S.R.M. Prasanna, R. Sinha, Speech biometric based attendance system, in National Conference on Communications (NCC) 2014 (IIT Kanpur, 2014)
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)
Article Google Scholar
S. Furui, Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)
Article Google Scholar
D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems, in Proc. Interspeech (2011), pp. 249–252
J. Gudnason, M. Brookes, Voice source cepstrum coefficients for speaker identification, in Proc. ICASSP (2008) , pp. 4821–4824
S. Hayakawa, K. Takeda, F. Itakura, Speaker identification using harmonic structure of lp-residual spectrum. Lecture Notesin Biometric Personal Aunthentification, vol. 1206 (Springer, Berlin , 1997), pp. 253–260
S. Jelil, R.K. Das, R. Sinha, S.R.M. Prasanna, Speaker verification using gaussian posteriorgrams on fixed phrase short utterances, in Interspeech 2015 (Dresden, Germany, 2015), pp. 1042–1046
A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, M. Mason, i-vector based speaker recognition on short utterances, In Interspeech 2011 (2011)
T. Kinnunen, M. Sahidullah, I. Kukanov, H. Delgado, M. Todisco, A.K. Sarkar, N.B. Thomsen, V. Hautamki, N. Evans, Z.H. Tan, Utterance verification for text-dependent speaker recognition: a comparative assessment using the reddots corpus. Interspeech 2016, 430–434 (2016)
Article Google Scholar
A. Larcher, P. Bousquet, K.A. Lee, D. Matrouf, H. Li, J.F. Bonastre, i-vectors in the context of phonetically-constrained short utterances for speaker verification, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012 (2012), pp. 4773–4776
A. Larcher, K.A. Lee, B. Ma, H. Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 (2013), pp. 7673–7677
A. Larcher, K.A. Lee, B. Ma, H. Li, Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)
Article Google Scholar
K.A. Lee, A. Larcher, W. Guangsen, K. Patrick, N. Brummer, D. van Leeuwen, H. Aronowitz, M. Kockmann, C. Vaquero, B. Ma, H. Li, T. Stafylakis, J. Alam, A. Swart, J. Perez, The RedDots data collection for speaker recognition, in Interspeech 2015 (Dresden, Germany, 2015), pp. 2996–3000
K.A. Lee, A. Larcher, H. Thai, B. Ma, H. Li, Joint application of speech and speaker recognition for automation and security in smart home, in INTERSPEECH (2011), pp. 3317–3318
J. Ma, S. Irtza, K. Sriskandaraja, V. Sethu, E. Ambikairajah, Parallel speaker and content modelling for text-dependent speaker verification. Interspeech 2016, 435–439 (2016)
Article Google Scholar
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Przybocki, The DET curve in assessment of detection task performance, in Proc. Eurospeech (Rhodes, Greece, 1997), pp. 1895–1898
I. Mporas, S. Safavi, R. Sotudeh, Improving robustness of speaker verification by fusion of prompted text-dependent and text-independent operation modalities, in Speech and Computer, ed. by A. Ronzhin, R. Potapova, G. Németh (Springer International Publishing, Cham, 2016), pp. 378–385
Chapter Google Scholar
K.S.R. Murty, B. Yegnanarayana, Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process. Lett. 13(1), 52–55 (2006)
Article Google Scholar
D. Pati, S.R.M. Prasanna, Speaker information from subband energies of linear prediction residual, in National Conference on Communications (NCC) (2010), pp. 1–4
D. Pati, S.R.M. Prasanna, A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana 38(4), 591–620 (2013)
Article MathSciNet MATH Google Scholar
S.R.M. Prasanna, C. Gupta, B. Yegananarayana, Extraction of speaker specific information from linear prediction residual of speech. Speech Commun. 48, 1243–1261 (2006)
Article Google Scholar
A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index, in IEEE Trans. on Audio, Speech, and Language Processing vol. 21, no. 12 (2013), pp. 2471–2480
B. Putra, Suyanto Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). in 2011 2nd international conference on instrumentation control and automation (ICA) (2011), pp 358–363
A.G. Ramakrishnan, B. Abhiram, S.R.M. Prasanna, Voice source characterization using pitch synchronous discrete cosine transform for speaker identification. JASA Express Lett. 137, EL469–EL475 (2015)
Google Scholar
R. Ramos-Lara, M. Lpez-Garca, E. Cant-Navarro, L. Puente-Rodriguez, Real-time speaker verification system implemented on reconfigurable hardware. J. Signal Process. Syst. 71(2), 89–103 (2013)
Article Google Scholar
S. Safavi, H. Gan, I. Mporas, Improving speaker verification performance under spoofing attacks by fusion of different operational modes, in 13th IEEE International Colloquium on Signal Processing its Applications (CSPA), vol. 2017 (2017), pp. 219–223
S. Safavi, I. Mporas, Combination of rule-based and data-driven fusion methodologies for different speaker verification modes of operation. in IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) 2017 (2017), pp. 354–359
S. Safavi, I. Mporas, Improving performance of speaker identification systems using score level fusion of two modes of operation, in Speech and Computer, ed. by A. Karpov, R. Potapova, I. Mporas (Springer International Publishing, Cham, 2017), pp. 438–444
Chapter Google Scholar
A.K. Sarkar, Z.H. Tan, Text dependent speaker verification using un-supervised hmm-ubm and temporal gmm-ubm. Interspeech 2016, 425–429 (2016)
Article Google Scholar
P. Thvenaz, H. Hgli, Usefulness of the lpc-residue in text-independent speaker verification. Speech Commun. 17(1–2), 145–157 (1995)
Article Google Scholar
G. Wang, K.A. Lee, T.H. Nguyen, H. Sun, B. Ma, Joint speaker and lexical modeling for short-term characterization of speaker, in Interspeech 2016 (2016), pp. 415–419
H. Zeinali, H. Sameti, L. Burget, J. ernock, N. Maghsoodi, P. Matjka, i-vector/hmm based text-dependent speaker verification system for reddots challenge, in Interspeech 2016 (2016), pp. 440–444

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 117583, Singapore
Rohan Kumar Das
Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Sarfaraz Jelil & S. R. Mahadeva Prasanna

Authors

Rohan Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Sarfaraz Jelil
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Kumar Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, R.K., Jelil, S. & Prasanna, S.R.M. Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification. Circuits Syst Signal Process 38, 1775–1792 (2019). https://doi.org/10.1007/s00034-018-0937-y

Download citation

Received: 05 June 2017
Revised: 27 August 2018
Accepted: 29 August 2018
Published: 11 September 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s00034-018-0937-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Test takers’ attitudes of using exam-oriented mobile application as a tool to adapt in a high-stakes speaking test

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring Text-Constraint Models and Source Information for Long-Enrollment with Short-Test Speaker Verification

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Test takers’ attitudes of using exam-oriented mobile application as a tool to adapt in a high-stakes speaking test

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation