Skip to main content

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

Abstract

This paper presents an effective method for improving the performance of a speaker identification system. Based on the multiresolution property of the wavelet transform, the input speech signal is decomposed into various frequency bands in order not to spread noise distortions over the entire feature space. The linear predictive cepstral coefficients (LPCCs) of each band are calculated. Furthermore, the cepstral mean normalization technique is applied to all computed features. We use feature recombination and likelihood recombination methods to evaluate the task of the text-independent speaker identification. The feature recombination scheme combines the cepstral coefficients of each band to form a single feature vector used to train the Gaussian mixture model (GMM). The likelihood recombination scheme combines the likelihood scores of independent GMM for each band. Experimental results show that both proposed methods outperform the GMM model using full-band LPCCs and mel-frequency cepstral coefficients (MFCCs) in both clean and noisy environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Acoust. Soc. Amer. J. 55, 1304–1312 (1974)

    Article  Google Scholar 

  2. White, G.M., Neely, R.B.: Speech recognition experiments with linear prediction, bandpass filtering, and dynamic Programming. IEEE Trans. Acoustics, Speech, Signal Processing 24(2), 183–188 (1976)

    Article  Google Scholar 

  3. Vergin, R., Shaughnessy, O., Farhat, D., Generalized, A.: mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech and Audio Processing 7(5), 525–532 (1999)

    Article  Google Scholar 

  4. Lockwood, P., Boudy, J.: Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars. Speech Commun 11(2-3), 21–228 (1992)

    Article  Google Scholar 

  5. Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust., Speech, Signal Processing 29(2), 254–272 (1981)

    Article  Google Scholar 

  6. Soong, F.K., Rosenberg, A.E.: On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans. Acoust., Speech, Signal Processing 36(6), 871–879 (1988)

    Article  MATH  Google Scholar 

  7. Hermansky, H., Tibrewala, S., Pavel, M.: Toward ASR on partially corrupted speech. In: Proc. Int. Conf. Spoken Language Processing, vol. 1, pp. 462–465 (1996)

    Google Scholar 

  8. Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proc. Int. Conf. Spoken Language Processing, vol. 3, pp. 743–747 (1998)

    Google Scholar 

  9. Bourlard, H., Dupont, S.: A new ASR approach based on independent processing and recombination of partial frequency bands. In: Proc. Int. Conf. Spoken Language Processing, pp. 426–429 (1996)

    Google Scholar 

  10. Okawa, S., Bocchieri, E., Potamianos, A.: Multi-band speech recognition in noisy environments. In: Proc. IEEE ICASSP 1998, vol. 2, pp. 641–644 (1998)

    Google Scholar 

  11. Hsieh, C.T., Lai, E., Wang, Y.C.: A robust speaker identification system based on wavelet transform. IEICE Trans. Inf. & Syst. E84-D(7), 839–846 (2001)

    Google Scholar 

  12. Hsieh, C.T., Lai, E., Wang, Y.C.: Robust speech features based on wavelet transform with application to speaker identification. In: IEE Proceedings. Vision, Image and Signal Processing, vol. 149(2), pp. 108–114 (2002)

    Google Scholar 

  13. Furui, S.: Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust., Speech, Signal Processing 29(3), 342–350 (1981)

    Article  Google Scholar 

  14. Poritz, A.: Linear predictive hidden markov models and the speech signal. In: Proc. IEEE ICASSP 1982, vol. 2, pp. 1291–1294 (1982)

    Google Scholar 

  15. Tishby, N.Z.: On the application of mixture AR hidden Markov models to text independent speaker recognition. IEEE Trans. Signal Processing 39, 563–570 (1991)

    Article  Google Scholar 

  16. Reynolds, D.A., Rose, R.C.: Robust test-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Processing 3(1), 72–83 (1995)

    Article  Google Scholar 

  17. Miyajima, C., Hattori, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Textindependent speaker identification using Gaussian mixture models based on multi-space probability distribution. IEICE Trans. Inf. & Syst. E84-D(7), 847–855 (2001)

    Google Scholar 

  18. Alamo, C.M., Gil, F.J.C., Munilla, C.T., Gomez, L.H.: Discriminative training of GMM for speaker identification. In: Proc. IEEE ICASSP 1996, pp. 89–92 (1996)

    Google Scholar 

  19. Pellom, B.L., Hansen, J.H.L.: An effective scoring algorithm for Gaussian mixture model based speaker identification. IEEE Signal Processing Letters 5(11), 281–284 (1998)

    Article  Google Scholar 

  20. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41, 909–996 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  21. Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: Proc. ESCA Workshop Automat. Speaker Recognition, Identification, Verification, pp. 39–42 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, WC., Hsieh, CT., Lai, E. (2005). Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics