Multi-style speaker recognition database in practical conditions

Das, Rohan Kumar; Jelil, Sarfaraz; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-017-9475-4

Multi-style speaker recognition database in practical conditions

Published: 20 November 2017

Volume 21, pages 409–419, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Rohan Kumar Das ORCID: orcid.org/0000-0002-1332-3357¹,
Sarfaraz Jelil¹ &
S. R. Mahadeva Prasanna¹

265 Accesses
5 Citations
Explore all metrics

Abstract

This work describes the process of collection and organization of a multi-style database for speaker recognition. The multi-style database organization is based on three different categories of speaker recognition: voice-password, text-dependent and text-independent framework. Three Indian institutes collaborated for the collection of the database at respective sites. The database is collected over an online telephone network that is deployed for speech based student attendance system. This enables the collection of data for a longer period from different speakers having session variabilities, which is useful for speaker verification (SV) studies in practical scenario. The database contains data of 923 speakers for the three different modes of SV and hence termed as multi-style speaker recognition database. This database is useful for session variability, multi-style speaker recognition and short utterance based SV studies. Initial results are reported over the database for the three different modes of SV. A copy of the database can be obtained by contacting the authors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker Verification Systems: A Comprehensive Review

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Fusion Multistyle Training for Speaker Identification of Disguised Speech

Article 26 October 2018

References

Benyassine, A., Shlomot, E., Su, H. Y., Massaloux, D., Lamblin, C., & Petit, J. P. (1997). Itu-t recommendation g.729 annex b: A silence compression scheme for use with g.729 optimized for v. 70 digital simultaneous voice and data applications. IEEE Communications Magazine, 35(9), 64–73.
Article Google Scholar
Campbell, W.M., Sturim, D.E., Reynolds, D.A., Solomonoff, A. (May 2006). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In Proceeding of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Campbell, J., & Higgins, A.L. (1994). A YOHO speaker verification corpus ldc94s16. Available on LCD website: http://www.ldc.upenn.edu.
Chakrabarty, D., Mahadeva Prasanna, S. R., & Das, R. K. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.
Article Google Scholar
Das, R. K., & Prasanna, S. R. M. (2015). Speaker verification for variable duration segments and the effect of session variability, Chap. 16. Lecture notes in electrical engineering (pp. 193–200). New York: Springer.
Google Scholar
Das, R. K., Jelil, S., & Prasanna, S. R. M. (2017). Development of multi-level speech based person authentication system. Journal of Signal Processing Systems, 88(3), 259–271.
Article Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Dey, S., Barman, S., Bhukya, R.K., Das, R.K., Haris, B.C., Prasanna, S.R.M., & Sinha, R. (2014). Speech biometric based attendance system. In National Conference on Communications.
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification. Wiley: New York.
MATH Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29(2), 254–272.
Article Google Scholar
Haris, B.C., Pradhan, G., Misra, A., Prasanna, S.R.M., Das, R.K., & Sinha, R. (2012). Multivariability speaker recognition database in indian scenario. International Journal of Speech Technology, 15(4), pp. 441–453, [Online]. http://dx.doi.org/10.1007/s10772-012-9140-x
Hèbert, M. (2008). Text-dependent speaker recognition (pp. 743–762). Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., & Mason, M. (2011). i-vector based speaker recognition on short utterances. In Interspeech 2011.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and RSR2015. Speech Communication, 60, 56–77.
Article Google Scholar
Lee, K. A., Larcher, A., Guangsen, W., Patrick, K., Brummer, N., van Leeuwen, D., et al. (2015). The RedDots data collection for speaker recognition. Interspeech 2015 Dresden, Germany (pp. 2996–3000). Red Hook, NY: NY Curran Associates, Inc.
Google Scholar
Lee, K.-A., Larcher, A., Thai, H., Ma, B., & Li, H. (2011). Joint application of speech and speaker recognition for automation and security in smart home. In Interspeech, pp. 3317–3318.
McLaren, M., Ferrer, L., Castan, D., & Lawson, A. (2016). The speakers in the wild (sitw) speaker recognition database. Interspeech, 2016, 818–822.
Article Google Scholar
NIST SRE Evaluations 1999-2016, NIST USA.
O’Shaughnessy, D. (1986). Speaker recognition. IEEE ASSP Magazine, 3(4), 4–17.
Article Google Scholar
Putra, B., & Suyanto. (2011). Implementation of secure speaker verification at web login page using mel frequency cepstral coefficient-gaussian mixture model (mfcc-gmm). In Instrumentation Control and Automation (ICA), 2011 2nd International Conference on, pp. 358–363.
Sarkar, G., & Saha, G. (2010). Real time implementation of speaker identification system with frame picking algorithm. Procedia Computer Science, 2(0), 173–180. (Proceedings of the International Conference and Exhibition on Biometrics Technology).
Article Google Scholar
Woo, R.H., Park, A., & Hazen, T.J. (2006). The mit mobile device speaker verification corpus: Data collection and preliminary experiments. In Proceeding of Odyssey, The Speaker & Language Recognition Workshop.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Article Google Scholar

Download references

Acknowledgements

This work is supported by a project grant 12(6)/2012-ESD for the project entitled “Development of Speech-Based Multi-level Person Authentication System” funded by the Department of Electronics and Information Technology (DeitY), Govt. of India. The authors would also like to acknowledge the consortium teams of the project at North-Eastern Hill University (NEHU), Shillong led by Dr. L. Joyprakash Singh and National Institute of Technology (NIT) Silchar led by Dr. R. H. Lashkar for their contribution in the collection and organization of the database.

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Rohan Kumar Das, Sarfaraz Jelil & S. R. Mahadeva Prasanna

Authors

Rohan Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Sarfaraz Jelil
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Kumar Das.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, R.K., Jelil, S. & Prasanna, S.R.M. Multi-style speaker recognition database in practical conditions. Int J Speech Technol 21, 409–419 (2018). https://doi.org/10.1007/s10772-017-9475-4

Download citation

Received: 14 July 2017
Accepted: 02 November 2017
Published: 20 November 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-017-9475-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-style speaker recognition database in practical conditions

Abstract

Access this article

Similar content being viewed by others

Speaker Verification Systems: A Comprehensive Review

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Fusion Multistyle Training for Speaker Identification of Disguised Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-style speaker recognition database in practical conditions

Abstract

Access this article

Similar content being viewed by others

Speaker Verification Systems: A Comprehensive Review

Speaker Verification for Variable Duration Segments and the Effect of Session Variability

Fusion Multistyle Training for Speaker Identification of Disguised Speech

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation