Skip to main content

Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech

Speech Understanding / Interaction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Abstract

Log area ratio coefficients (LAR) derived from linear prediction coefficients (LPC) is a well known feature extraction technique used in speech applications. This paper presents a novel way to use the LAR feature in a speaker identification system. Here, instead of using the mel frequency cepstral coefficients (MFCC), the LAR feature is used in a Gaussian mixture model (GMM) based speaker identification system. An F-ratio feature analysis was conducted on both the LAR and MFCC feature vectors which showed the lower order LAR coefficients are superior to MFCC counterpart. The text- independent, closed-set speaker identification rate, as tested on the down- sampled version of TIMIT database, was improved from 96.73%, using the MFCC feature, to 98.81%, using the LAR features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Premakanthan, P., Mikhad, W.B.: Speaker Verification/Recognition and the Importance of Selective Feature Extraction: Review. MWSCAS 1, 57–61 (2001)

    Google Scholar 

  2. Orman, O.D.: Frequency Analysis of Speaker Identification Performance. Master thesis, Bo aziçi University (2000)

    Google Scholar 

  3. Sanderson, S.: Automatic Person Verification Using Speech and Face Information. PhD thesis. Griffith University (2002)

    Google Scholar 

  4. Petry, A., Barone, D.A.C.: Fractal Dimension Applied to Speaker Identification. In: ICASSP (Salt Lake City). May 7-11, pp. 405–408 (2001)

    Google Scholar 

  5. Liu, C.H., Chen, O.T.C.: A Text-Independent Speaker Identification System Using PARCOR and AR Model. MWSCAS 3, 332–335 (2002)

    Google Scholar 

  6. Marvin, R.S.: Speaker Recognition Using Orthogonal Linear Prediction. IEEE Transactions on Acoustic, Speech and Signal Processing 24, 283–289 (1976)

    Article  Google Scholar 

  7. Makhoul, J.: Linear Prediction: A Tutorial Review. Proceedings of the IEEE 63, 561–579 (1975)

    Article  Google Scholar 

  8. Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)

    Article  Google Scholar 

  9. Campell Jr., J.P.: Speaker recognition: a tutorial. Speaker recognition: a tutorial 85, 1437–1462 (1997)

    Google Scholar 

  10. Karpov, E.: Real-Time Speaker Identification. Master thesis, University of Joensuu (2003)

    Google Scholar 

  11. Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report, University of Berkeley (1998)

    Google Scholar 

  12. Rabiner, L., Sambur, B.: An Algorithm for Determining the Endpoints of Isolated Utterances. The Bell System Technical Journal 54, 297–315 (1975)

    Google Scholar 

  13. Linde, Y., Buzo, A., Gray, R.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1), 84–95 (1980)

    Article  Google Scholar 

  14. Paliwal, K.K.: Dimensionality Reduction of the Enhanced Feature Set for the HMMBased Speech Recognizer. Digital Signal Processing 2, 157–173 (1992)

    Article  Google Scholar 

  15. Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The Effects of Telephone Transmission Degradations on Speaker Recognition Performance. In: ICASSP (Detroit). May 9-12, pp. 329–331 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chow, D., Abdulla, W.H. (2004). Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_95

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28633-2_95

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22817-2

  • Online ISBN: 978-3-540-28633-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics