Abstract
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.
Similar content being viewed by others
References
Alexandera, A., Bottib, F., Dessimozb, D., & Drygajlo, A. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. Forensic Science International, 95–99.
Barras, C., & Gauvain, J. L. (2003). Feature and score normalization for speaker verification of cellular data. In ICASSP (pp. 49–52).
Campbell, W., Campbell, J., Gleason, T., Reynolds, D., & Shen, W. (2007). Speaker verification using support vector machines and high-level features. IEEE Transactions on Audio, Speech, and Language Processing, 15(7), 2085–2094.
Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Doddington, G. (1985). Speaker recognition-identifying people by their voices. Proceedings of the IEEE, 73(11), 1651–1664.
Frischholz, R. W., & Dieckmann, U. (2000). Bioid:a multimodal biometric identification system.
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-29(2), 254–272.
Gorman, L. (2003). Comparing passwords, tokens, and biometrics for user authentication.
Haris, B. C., Pradhan, G., Misra, A., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012, accepted). Multivariability speaker recognition database in Indian scenario. International Journal of Speech Technology.
Haris, B. C., & Sinha, R. (2012, accepted). Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In ICASSP 2012.
Asterisk open source communications http://www.asterisk.org/home.
Huang, K., & Aviyente, S. (2006). Sparse representation for signal classification. In Neural information processing systems.
Jain, A. K., & Hong, L. (1998). Integrating faces and fingerprints for personal identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12), 1295–1307.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Kishore, S., & Yegnanarayana, B. (2001). Online text independent speaker verification system using autoassociative neural network models. In Proceedings of IEEE int. joint conf. on neural networks.
Mak, Man-Wai, & Kwok-Kwong Yiu, S.-Y. K. (2007). Probabilistic feature-based transformation for speaker verification over telephone networks. Neurocomputing 71, 137–146.
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.
Prasanna, S. R. M., & Pradhan, G. (2011). Significance of Vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing, 19, 2552–2556.
Prasanna, S. R. M., & Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation source information. In INTERSPEECH (pp. 1133–1136).
Gazit, R., Metzger, Y., & Toledo-Ronen, O. (2001). Speaker verification over cellular networks. In Speaker odyssey.
Reynolds, D. A. (2002). An overview of automatic speaker recognition technology. In Proceedings of international conference on acoustics, speech and signal processing 2002 (ICASSP ’02).
Ross, A., & Jain, A. K. (2004). Multimodal biometrics: an overview. In Proc. of 12th European signal processing conference (EUSIPCO), Vienna, Austria (pp. 1221–1224).
Sankar, A., & Lee, C.-H. (1996). A maximum-likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing, 4(3), 190–202.
Teunen, R., Shahshahani, B., & Heck, L. (2000). A model-based transformational approach to robust speaker recognition. In Proc. int. conf. on spoken language processing (ICSLP ’00) (pp. 495–498).
Wu, W., Zheng, T. F., Xu, M., & Soong, F. K. (2007). A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1893–1903.
Saquib, Z., Salam, N., Nair, R., & Pandey, N. (2011). Voiceprint recognition systems for remote authentication-a survey. International Journal of Hybrid Information Technology, 4(2).
Acknowledgements
This work has been supported by the project grant No. 12(4)/2009-ESD sponsored by the Department of Information Technology, Government of India.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chakrabarty, D., Prasanna, S.R.M. & Das, R.K. Development and evaluation of online text-independent speaker verification system for remote person authentication. Int J Speech Technol 16, 75–88 (2013). https://doi.org/10.1007/s10772-012-9160-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-012-9160-6