Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

Chen, Wan-Chen; Hsieh, Ching-Tang; Lai, Eugene

doi:10.1007/978-3-540-30211-7_28

Wan-Chen Chen²²,
Ching-Tang Hsieh²³ &
Eugene Lai²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

International Conference on Natural Language Processing

1582 Accesses
2 Citations

Abstract

This paper presents an effective method for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency bands in order not to spread noise distortions over the entire feature space. The linear predictive cepstral coefficients (LPCCs) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features. We use feature recombination and likelihood recombination methods to evaluate the task of the text-independent speaker identification. The feature recombination scheme combines the cepstral coefficients of each band to form a single feature vector used to train the Gaussian mixture model (GMM). The likelihood recombination scheme combines the likelihood scores of independent GMM for each band. Experimental results show that both proposed methods outperform the GMM model using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) in both clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Acoust. Soc. Amer. J. 55, 1304–1312 (1974)
Article Google Scholar
White, G.M., Neely, R.B.: Speech recognition experiments with linear prediction, bandpass filtering, and dynamic Programming. IEEE Trans. Acoustics, Speech, Signal Processing 24(2), 183–188 (1976)
Article Google Scholar
Vergin, R., Shaughnessy, O., Farhat, D., Generalized, A.: mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech and Audio Processing 7(5), 525–532 (1999)
Article Google Scholar
Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Commun 11(2-3), 21–228 (1992)
Article Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust., Speech, Signal Processing 29(2), 254–272 (1981)
Article Google Scholar
Soong, F.K., Rosenberg, A.E.: On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. Acoust., Speech, Signal Processing 36(6), 871–879 (1988)
Article MATH Google Scholar
Hermansky, H., Tibrewala, S., Pavel, M.: Toward ASR on partially corrupted speech. In: Proc. Int. Conf. Spoken Language Processing, vol. 1, pp. 462–465 (1996)
Google Scholar
Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proc. Int. Conf. Spoken Language Processing, vol. 3, pp. 743–747 (1998)
Google Scholar
Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proc. Int. Conf. Spoken Language Processing, pp. 426–429 (1996)
Google Scholar
Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band speech recognition in noisy environments. In: Proc. IEEE ICASSP 1998, vol. 2, pp. 641–644 (1998)
Google Scholar
Hsieh, C.T., Lai, E., Wang, Y.C.: A robust speaker identification system based on wavelet transform. IEICE Trans. Inf. & Syst. E84-D(7), 839–846 (2001)
Google Scholar
Hsieh, C.T., Lai, E., Wang, Y.C.: Robust speech features based on wavelet transform with application to speaker identification. In: IEE Proceedings. Vision, Image and Signal Processing, vol. 149(2), pp. 108–114 (2002)
Google Scholar
Furui, S.: Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust., Speech, Signal Processing 29(3), 342–350 (1981)
Article Google Scholar
Poritz, A.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE ICASSP 1982, vol. 2, pp. 1291–1294 (1982)
Google Scholar
Tishby, N.Z.: On the application of mixture AR hidden Markov models to text independent speaker recognition. IEEE Trans. Signal Processing 39, 563–570 (1991)
Article Google Scholar
Reynolds, D.A., Rose, R.C.: Robust test-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)
Article Google Scholar
Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Textindependent speaker identification using Gaussian mixture models based on multi-space probability distribution. IEICE Trans. Inf. & Syst. E84-D(7), 847–855 (2001)
Google Scholar
Alamo, C.M., Gil, F.J.C., Munilla, C.T., Gomez, L.H.: Discriminative training of GMM for speaker identification. In: Proc. IEEE ICASSP 1996, pp. 89–92 (1996)
Google Scholar
Pellom, B.L., Hansen, J.H.L.: An effective scoring algorithm for Gaussian mixture model based speaker identification. IEEE Signal Processing Letters 5(11), 281–284 (1998)
Article Google Scholar
Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)
Article MATH MathSciNet Google Scholar
Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: Proc. ESCA Workshop Automat. Speaker Recognition, Identification, Verification, pp. 39–42 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electronic Engineering, St. John’s & St. Mary’s Institute of technology, Taipei
Wan-Chen Chen
Dept. of Electrical Engineering, Tamkang University, Taipei
Ching-Tang Hsieh & Eugene Lai

Authors

Wan-Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Tang Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Lai
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Behavior Design Corporation, IV Science-Based Industrial Park Hsinchu, 2F, No.5, Industry E. Rd, Taiwan
Keh-Yih Su
University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JST CREST, Honcho 4-1-8, Kawaguchi-shi,, 332-0012, Saitama,
Jun’ichi Tsujii
Pohang University of Science and Technology (POSTECH), AITrc, Republic of Korea
Jong-Hyeok Lee
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, WC., Hsieh, CT., Lai, E. (2005). Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-30211-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics