Abstract
This paper presents an effective method for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency bands in order not to spread noise distortions over the entire feature space. The linear predictive cepstral coefficients (LPCCs) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features. We use feature recombination and likelihood recombination methods to evaluate the task of the text-independent speaker identification. The feature recombination scheme combines the cepstral coefficients of each band to form a single feature vector used to train the Gaussian mixture model (GMM). The likelihood recombination scheme combines the likelihood scores of independent GMM for each band. Experimental results show that both proposed methods outperform the GMM model using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) in both clean and noisy environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Acoust. Soc. Amer. J. 55, 1304–1312 (1974)
White, G.M., Neely, R.B.: Speech recognition experiments with linear prediction, bandpass filtering, and dynamic Programming. IEEE Trans. Acoustics, Speech, Signal Processing 24(2), 183–188 (1976)
Vergin, R., Shaughnessy, O., Farhat, D., Generalized, A.: mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech and Audio Processing 7(5), 525–532 (1999)
Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Commun 11(2-3), 21–228 (1992)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust., Speech, Signal Processing 29(2), 254–272 (1981)
Soong, F.K., Rosenberg, A.E.: On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. Acoust., Speech, Signal Processing 36(6), 871–879 (1988)
Hermansky, H., Tibrewala, S., Pavel, M.: Toward ASR on partially corrupted speech. In: Proc. Int. Conf. Spoken Language Processing, vol. 1, pp. 462–465 (1996)
Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proc. Int. Conf. Spoken Language Processing, vol. 3, pp. 743–747 (1998)
Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proc. Int. Conf. Spoken Language Processing, pp. 426–429 (1996)
Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band speech recognition in noisy environments. In: Proc. IEEE ICASSP 1998, vol. 2, pp. 641–644 (1998)
Hsieh, C.T., Lai, E., Wang, Y.C.: A robust speaker identification system based on wavelet transform. IEICE Trans. Inf. & Syst. E84-D(7), 839–846 (2001)
Hsieh, C.T., Lai, E., Wang, Y.C.: Robust speech features based on wavelet transform with application to speaker identification. In: IEE Proceedings. Vision, Image and Signal Processing, vol. 149(2), pp. 108–114 (2002)
Furui, S.: Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust., Speech, Signal Processing 29(3), 342–350 (1981)
Poritz, A.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE ICASSP 1982, vol. 2, pp. 1291–1294 (1982)
Tishby, N.Z.: On the application of mixture AR hidden Markov models to text independent speaker recognition. IEEE Trans. Signal Processing 39, 563–570 (1991)
Reynolds, D.A., Rose, R.C.: Robust test-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)
Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Textindependent speaker identification using Gaussian mixture models based on multi-space probability distribution. IEICE Trans. Inf. & Syst. E84-D(7), 847–855 (2001)
Alamo, C.M., Gil, F.J.C., Munilla, C.T., Gomez, L.H.: Discriminative training of GMM for speaker identification. In: Proc. IEEE ICASSP 1996, pp. 89–92 (1996)
Pellom, B.L., Hansen, J.H.L.: An effective scoring algorithm for Gaussian mixture model based speaker identification. IEEE Signal Processing Letters 5(11), 281–284 (1998)
Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)
Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: Proc. ESCA Workshop Automat. Speaker Recognition, Identification, Verification, pp. 39–42 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, WC., Hsieh, CT., Lai, E. (2005). Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)