Skip to main content
Log in

Development and evaluation of online text-independent speaker verification system for remote person authentication

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Alexandera, A., Bottib, F., Dessimozb, D., & Drygajlo, A. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Science International, 95–99.

  • Barras, C., & Gauvain, J. L. (2003). Feature and score normalization for speaker verification of cellular data. In ICASSP (pp. 49–52).

    Google Scholar 

  • Campbell, W., Campbell, J., Gleason, T., Reynolds, D., & Shen, W. (2007). Speaker verification using support vector machines and high-level features. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2085–2094.

    Article  Google Scholar 

  • Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Doddington, G. (1985). Speaker recognition-identifying people by their voices. Proceedings of the IEEE, 73(11), 1651–1664.

    Article  Google Scholar 

  • Frischholz, R. W., & Dieckmann, U. (2000). Bioid:a multimodal biometric identification system.

  • Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-29(2), 254–272.

    Article  Google Scholar 

  • Gorman, L. (2003). Comparing passwords, tokens, and biometrics for user authentication.

  • Haris, B. C., Pradhan, G., Misra, A., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012, accepted). Multivariability speaker recognition database in Indian scenario. International Journal of Speech Technology.

  • Haris, B. C., & Sinha, R. (2012, accepted). Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In ICASSP 2012.

  • Asterisk open source communications http://www.asterisk.org/home.

  • Huang, K., & Aviyente, S. (2006). Sparse representation for signal classification. In Neural information processing systems.

    Google Scholar 

  • Jain, A. K., & Hong, L. (1998). Integrating faces and fingerprints for personal identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1295–1307.

    Article  Google Scholar 

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  • Kishore, S., & Yegnanarayana, B. (2001). Online text independent speaker verification system using autoassociative neural network models. In Proceedings of IEEE int. joint conf. on neural networks.

    Google Scholar 

  • Mak, Man-Wai, & Kwok-Kwong Yiu, S.-Y. K. (2007). Probabilistic feature-based transformation for speaker verification over telephone networks. Neurocomputing 71, 137–146.

    Article  Google Scholar 

  • Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.

    Article  Google Scholar 

  • Prasanna, S. R. M., & Pradhan, G. (2011). Significance of Vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing, 19, 2552–2556.

    Article  Google Scholar 

  • Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In INTERSPEECH (pp. 1133–1136).

    Google Scholar 

  • Gazit, R., Metzger, Y., & Toledo-Ronen, O. (2001). Speaker verification over cellular networks. In Speaker odyssey.

    Google Scholar 

  • Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In Proceedings of international conference on acoustics, speech and signal processing 2002 (ICASSP ’02).

    Google Scholar 

  • Ross, A., & Jain, A. K. (2004). Multimodal biometrics: an overview. In Proc. of 12th European signal processing conference (EUSIPCO), Vienna, Austria (pp. 1221–1224).

    Google Scholar 

  • Sankar, A., & Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202.

    Article  Google Scholar 

  • Teunen, R., Shahshahani, B., & Heck, L. (2000). A model-based transformational approach to robust speaker recognition. In Proc. int. conf. on spoken language processing (ICSLP ’00) (pp. 495–498).

    Google Scholar 

  • Wu, W., Zheng, T. F., Xu, M., & Soong, F. K. (2007). A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1893–1903.

    Article  Google Scholar 

  • Saquib, Z., Salam, N., Nair, R., & Pandey, N. (2011). Voiceprint recognition systems for remote authentication-a survey. International Journal of Hybrid Information Technology, 4(2).

Download references

Acknowledgements

This work has been supported by the project grant No. 12(4)/2009-ESD sponsored by the Department of Information Technology, Government of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. R. Mahadeva Prasanna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chakrabarty, D., Prasanna, S.R.M. & Das, R.K. Development and evaluation of online text-independent speaker verification system for remote person authentication. Int J Speech Technol 16, 75–88 (2013). https://doi.org/10.1007/s10772-012-9160-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9160-6

Keywords

Navigation