Speaker verification using excitation source information

Pati, Debadatta; Mahadeva Prasanna, S. R.

doi:10.1007/s10772-012-9137-5

Speaker verification using excitation source information

Published: 08 March 2012

Volume 15, pages 241–257, (2012)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Debadatta Pati¹ &
S. R. Mahadeva Prasanna¹

299 Accesses
15 Citations
Explore all metrics

Abstract

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the subsegmental, segmental and suprasegmental processing of the LP residual. The speaker-specific information from each level is modeled independently using Gaussian mixture modeling—universal background model (GMM-UBM) modeling and then combined at the score level. The significance of the proposed speaker recognition system is demonstrated by conducting speaker verification experiments on the NIST-03 database. Two different tests, namely, Clean test and Noisy test are conducted. In case of Clean test, the test speech signal is used as it is for verification. In case of Noisy test, the test speech is corrupted by factory noise (9 dB) and then used for verification. Even though for Clean test case, the proposed source based speaker recognition system still provides relatively poor performance than the vocal tract information, its performance is better for Noisy test case. Finally, for both clean and noisy cases, by providing different and robust speaker-specific evidences, the proposed system helps the vocal tract system to further improve the overall performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319.
Article Google Scholar
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312.
Article Google Scholar
Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460–475.
Article Google Scholar
Campbell, J. P. Jr. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Article Google Scholar
Chan, W. N., Zheng, N., & Lee, T. (2007). Discrimination power of vocal source and vocal tract related features for speaker segmentations. IEEE Transactions on Audio, Speech and Signal Processing, 15(6), 1884–1892.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366.
Article Google Scholar
Deller, J. R. Jr., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete-Time Processing of Speech Signal (2nd edn.). New York: IEEE Press.
Google Scholar
Falk, T. H., & Chan, W.-Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 18(1), 90–100.
Article Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
Article Google Scholar
Gish, H., & Schmidt, M. (1994). Text- independent speaker identification. IEEE Signal Processing Magazine, 11, 18–32.
Article Google Scholar
Hall, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 66–75.
Article Google Scholar
Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes: Vol. 1206. Biometric personal Authentification (pp. 253–260). Berlin: Springer.
Google Scholar
Iseli, M. R., & Alwan, A. (2000). Inter- and intra-speaker variability of glottal flow derivative. In Int. conf. on spoken language processing (ICSLP, 2000), Beijing, China.
Google Scholar
Kinnunen, T., & Li, H. (2009). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Article Google Scholar
Linguistic Data Consortium (2004). Switchboard cellular part 2 audio. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2004S07.
Makhoul, J. (1975). Linear prediction: a tutorial review. Proceedings of the IEEE, 63(4), 561–580.
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece (Vol. 4, pp. 1895–1898).
Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
Article Google Scholar
Mashao, D. J., & Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155.
Article Google Scholar
Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1602–1613.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.
Article Google Scholar
Murty, K. S. R., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).
Google Scholar
Naylor, P. A., Kounoudes, A., Gudnason, J., & Brookes, M. (2007). Estimation of glottal closure instants in voiced speech using the dypsa algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 34–43.
Article Google Scholar
Nist speaker recognition evaluation plan (2003). In Proc. NIST speaker recognition workshop, College Park, MD.
Google Scholar
Padmanabhan, R., & Murthy, H. A. (2010). Acoustic feature diversity and speaker verification. In INTERSPEECH 2010, Sept., Makuhari, Chiba, Japan (pp. 2010–2013).
Google Scholar
Pati, D., & Prasanna, S. R. M. (2008). Non-parametric vector quantization of excitation source information for speaker recognition. In Proc. IEEE TENCON, 2008 (pp. 1–4).
Google Scholar
Pati, D., & Prasanna, S. R. M. (2010). Speaker information from subband energies of linear prediction residual. In Proc. NCC 2010 (pp. 1–4).
Google Scholar
Pati, D., & Prasanna, S. R. M. (2011a). Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. International Journal of Speech Technology, 14(1), 49–63.
Article Google Scholar
Pati, D., & Prasanna, S. R. M. (2011b, accepted). Speaker recognition using suprasegmental level excitation information. International Journal of Information and Communication Technology (IJICT).
Pati, D., & Prasanna, S. R. M. (2012a, in press). Processing of linear prediction residual in spectral and cepstral domains for speaker information. In Communicated to SADHANA (Springer).
Pati, D., & Prasanna, S. R. M. (2012b, in press). A comparative study of explicit and implicit modeling of subsegmental speaker-specific excitation source information. In Communicated to SADHANA (Springer).
Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586.
Article Google Scholar
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Article Google Scholar
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall.
Google Scholar
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643.
Article Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995a). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995b). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Thevenaz, P., & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157.
Article Google Scholar
Veldhuish, R. (1998). A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. The Journal of the Acoustical Society of America, 103(1), 566–571.
Article Google Scholar
Wang, N., Ching, P. C., & Lee, T. (2009). Exploration of vocal excitation modulation features for speaker recognition. In Proc. INTERSPEECH-09, Brighton, UK (pp. 892–895).
Google Scholar
Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiers and their applications to handwriting. IEEE Transactions on Systems, Man, and Cybernetics, 22(3), 412–435.
Article Google Scholar
Yegnanarayana, B., & Veldhuis, R. N. J. (1998). Extraction of vocal-tract system characteristics from speech signals. IEEE Transactions on Speech and Audio Processing, 6(4), 313–327.
Article Google Scholar
Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and systsem feature for speaker recognition using AANN Models. In Proc. IEEE int. con. acoust. speech and signal process, Salt Lake City, UT, USA, May (pp. 409–412).
Google Scholar
Yegnenarayana, B., & Murthy, K. S. R. (2009). Event based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 614–624.
Article Google Scholar
Zheng, N., Lee, T., & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Debadatta Pati & S. R. Mahadeva Prasanna

Authors

Debadatta Pati
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. R. Mahadeva Prasanna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pati, D., Mahadeva Prasanna, S.R. Speaker verification using excitation source information. Int J Speech Technol 15, 241–257 (2012). https://doi.org/10.1007/s10772-012-9137-5

Download citation

Received: 16 April 2011
Accepted: 20 February 2012
Published: 08 March 2012
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10772-012-9137-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker verification using excitation source information

Abstract

Access this article

Similar content being viewed by others

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Closed-set speaker identification using VQ and GMM based models

Robust Speaker Identification Algorithms and Results in Noisy Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speaker verification using excitation source information

Abstract

Access this article

Similar content being viewed by others

Effective use of combined excitation source and vocal-tract information for speaker recognition tasks

Closed-set speaker identification using VQ and GMM based models

Robust Speaker Identification Algorithms and Results in Noisy Environments

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation