Skip to main content
Log in

Significance of duration modification for speaker verification under mismatch speech tempo condition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This work explores the scope of duration modification for speaker verification (SV) under mismatch speech tempo condition. The SV performance is found to depend on speaking rate of a speaker. The mismatch in the speaking rate can degrade the performance of a system and is crucial from the perspective of deployable systems. In this work, an analysis of SV performance is carried out by varying the speaking rate of train and test speech. Based on the studies, a framework is proposed to compensate the mismatch in speech tempo. The framework changes the duration of test speech in terms of speaking rate according to the derived mismatch factor between train and test speech. This in turn matches speech tempo of the test speech to that of the claimed speaker model. The proposed approach is found to have significant impact on SV performance while comparing the performance under mismatch conditions. A set of practical data having mismatch in speech tempo is also used to cross-validate the framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Chakrabarty, D., Prasanna, S. M., & Das, R. K. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.

    Article  Google Scholar 

  • Crochiere, R. E. (1980). A weighted overlap-add method of short-time fourier analysis/synthesis. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(1), 99–102.

    Article  Google Scholar 

  • Das, R. K., Jelil, S., & Prasanna, S. M. (2016). Development of multi-level speech based person authentication system. Journal of Signal Processing Systems, 88, 1–13. https://doi.org/10.1007/s11265-016-1148-z.

    Google Scholar 

  • Das, R. K., Prasanna, S. R. M. (2015). Speaker verification for variable duration segments and the effect of session variability. Lecture Notes in Electrical Engineering (pp. 193–200). New York: Springer.

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.

    Article  Google Scholar 

  • Dey, S., Barman, S., Bhukya, R. K., Das, R. K., Haris B C, Prasanna, S. R. M., & Sinha, R (2014). Speech biometric based attendance system. In National Conference on Communications (NCC) 2014, IIT Kanpur.

  • Duda, R . O., Hart, P . E., & Stork, D . G. (2000). Pattern classification. Hoboken: Wiley.

    MATH  Google Scholar 

  • Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2), 254–272.

    Article  Google Scholar 

  • Garcia-Romero, D., & Espy-Wilson, C. Y. (2011) . Analysis of i-vector length normalization in speaker recognition systems. In Interspeech (pp. 249–252).

  • Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S. & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Interspeech 2011.

  • Lee, K. A., Larcher, A., Thai, H., Ma, B. & Li, H. (2011). Joint application of speech and speaker recognition for automation and security in smart home. In Interspeech (pp. 3317–3318).

  • Martinez, F., Tapias, D., & Alvarez, J. (1998). Towards speech rate independence in large vocabulary continuous speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2, pp. 725–728).

  • Matsui, T., & Furui, S. (1994). Comparison of text-independent speaker recognition methods using vq-distortion and discrete/continuous HMM’s. IEEE Transactions on Speech and Audio Processing, 2(3), 456–459.

    Article  Google Scholar 

  • Morgan, N., & Fosler-Lussier, E. (1998). Combining multiple estimators of speaking rate. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2, 729–732.

    Google Scholar 

  • Murty, K., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.

    Article  Google Scholar 

  • Murty, K., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.

    Article  Google Scholar 

  • NIST. (2003). The NIST Year 2003 Speaker Recognition Evaluation Plan.

  • Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.

    Article  Google Scholar 

  • Putra, B. (2011). Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). In Instrumentation Control and Automation (ICA), 2011 2nd International Conference (pp. 358–363).

  • Rao, K., & Yegnanarayana, B. (2006). Prosody modification using instants of significant excitation. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 972–980.

    Article  Google Scholar 

  • Roucos, S., & Wilgus, A. M. (1985). High quality time-scale modification for speech. In IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) ’85. (Vol. 10, pp. 493–496).

  • Sarkar, G., & Saha, G. (2010). Real time implementation of speaker identification system with frame picking algorithm. Procedia Computer Science, 2, 173 – 180.

    Article  Google Scholar 

  • Sharma, B., & Prasanna, S. R. M. (2014). Faster prosody modification using time scaling of epochs. Annual IEEE India Conference (INDICON) (pp. 1–5).

    Google Scholar 

  • Siegler, M. A., Stern, R. M. (1995). On the effects of speech rate in large vocabulary speech recognition systems. In International Conference on Acoustics, Speech, and Signal Processing, 1995 (ICASSP-95) (Vol. 1, pp. 612–615).

  • Yasuda, H. & Kudo, M. (2012),Speech rate change detection in martingale framework. In 12th International Conference on Intelligent Systems Design and Applications (ISDA) (pp. 859–864).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohan Kumar Das.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, R.K., Sharma, B. & Prasanna, S.R.M. Significance of duration modification for speaker verification under mismatch speech tempo condition. Int J Speech Technol 21, 401–408 (2018). https://doi.org/10.1007/s10772-017-9474-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9474-5

Keywords

Navigation