Significance of duration modification for speaker verification under mismatch speech tempo condition

Das, Rohan Kumar; Sharma, Bidisha; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-017-9474-5

Significance of duration modification for speaker verification under mismatch speech tempo condition

Published: 15 November 2017

Volume 21, pages 401–408, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Rohan Kumar Das ORCID: orcid.org/0000-0002-1332-3357¹,
Bidisha Sharma¹ &
S. R. Mahadeva Prasanna¹

187 Accesses
2 Citations
Explore all metrics

Abstract

This work explores the scope of duration modification for speaker verification (SV) under mismatch speech tempo condition. The SV performance is found to depend on speaking rate of a speaker. The mismatch in the speaking rate can degrade the performance of a system and is crucial from the perspective of deployable systems. In this work, an analysis of SV performance is carried out by varying the speaking rate of train and test speech. Based on the studies, a framework is proposed to compensate the mismatch in speech tempo. The framework changes the duration of test speech in terms of speaking rate according to the derived mismatch factor between train and test speech. This in turn matches speech tempo of the test speech to that of the claimed speaker model. The proposed approach is found to have significant impact on SV performance while comparing the performance under mismatch conditions. A set of practical data having mismatch in speech tempo is also used to cross-validate the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker Verification Performance Evaluation Based on Open Source Speech Processing Software and TIMIT Speech Corpus

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Exploring single channel speech separation for short-time text-dependent speaker verification

Article 13 January 2022

References

Chakrabarty, D., Prasanna, S. M., & Das, R. K. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.
Article Google Scholar
Crochiere, R. E. (1980). A weighted overlap-add method of short-time fourier analysis/synthesis. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(1), 99–102.
Article Google Scholar
Das, R. K., Jelil, S., & Prasanna, S. M. (2016). Development of multi-level speech based person authentication system. Journal of Signal Processing Systems, 88, 1–13. https://doi.org/10.1007/s11265-016-1148-z.
Google Scholar
Das, R. K., Prasanna, S. R. M. (2015). Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering (pp. 193–200). New York: Springer.
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Dey, S., Barman, S., Bhukya, R. K., Das, R. K., Haris B C, Prasanna, S. R. M., & Sinha, R (2014). Speech biometric based attendance system. In National Conference on Communications (NCC) 2014, IIT Kanpur.
Duda, R . O., Hart, P . E., & Stork, D . G. (2000). Pattern classification. Hoboken: Wiley.
MATH Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2), 254–272.
Article Google Scholar
Garcia-Romero, D., & Espy-Wilson, C. Y. (2011) . Analysis of i-vector length normalization in speaker recognition systems. In Interspeech (pp. 249–252).
Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S. & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Interspeech 2011.
Lee, K. A., Larcher, A., Thai, H., Ma, B. & Li, H. (2011). Joint application of speech and speaker recognition for automation and security in smart home. In Interspeech (pp. 3317–3318).
Martinez, F., Tapias, D., & Alvarez, J. (1998). Towards speech rate independence in large vocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 725–728).
Matsui, T., & Furui, S. (1994). Comparison of text-independent speaker recognition methods using vq-distortion and discrete/continuous HMM’s. IEEE Transactions on Speech and Audio Processing, 2(3), 456–459.
Article Google Scholar
Morgan, N., & Fosler-Lussier, E. (1998). Combining multiple estimators of speaking rate. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 729–732.
Google Scholar
Murty, K., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
Murty, K., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
NIST. (2003). The NIST Year 2003 Speaker Recognition Evaluation Plan.
Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.
Article Google Scholar
Putra, B. (2011). Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). In Instrumentation Control and Automation (ICA), 2011 2nd International Conference (pp. 358–363).
Rao, K., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 972–980.
Article Google Scholar
Roucos, S., & Wilgus, A. M. (1985). High quality time-scale modification for speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) ’85. (Vol. 10, pp. 493–496).
Sarkar, G., & Saha, G. (2010). Real time implementation of speaker identification system with frame picking algorithm. Procedia Computer Science, 2, 173 – 180.
Article Google Scholar
Sharma, B., & Prasanna, S. R. M. (2014). Faster prosody modification using time scaling of epochs. Annual IEEE India Conference (INDICON) (pp. 1–5).
Google Scholar
Siegler, M. A., Stern, R. M. (1995). On the effects of speech rate in large vocabulary speech recognition systems. In International Conference on Acoustics, Speech, and Signal Processing, 1995 (ICASSP-95) (Vol. 1, pp. 612–615).
Yasuda, H. & Kudo, M. (2012),Speech rate change detection in martingale framework. In 12th International Conference on Intelligent Systems Design and Applications (ISDA) (pp. 859–864).

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Rohan Kumar Das, Bidisha Sharma & S. R. Mahadeva Prasanna

Authors

Rohan Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Bidisha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Kumar Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, R.K., Sharma, B. & Prasanna, S.R.M. Significance of duration modification for speaker verification under mismatch speech tempo condition. Int J Speech Technol 21, 401–408 (2018). https://doi.org/10.1007/s10772-017-9474-5

Download citation

Received: 14 July 2017
Accepted: 02 November 2017
Published: 15 November 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-017-9474-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Significance of duration modification for speaker verification under mismatch speech tempo condition

Abstract

Access this article

Similar content being viewed by others

Speaker Verification Performance Evaluation Based on Open Source Speech Processing Software and TIMIT Speech Corpus

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Exploring single channel speech separation for short-time text-dependent speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Significance of duration modification for speaker verification under mismatch speech tempo condition

Abstract

Access this article

Similar content being viewed by others

Speaker Verification Performance Evaluation Based on Open Source Speech Processing Software and TIMIT Speech Corpus

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Exploring single channel speech separation for short-time text-dependent speaker verification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation