Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech

Chow, David; Abdulla, Waleed H.

doi:10.1007/978-3-540-28633-2_95

Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech

Speech Understanding / Interaction

David Chow²¹ &
Waleed H. Abdulla²¹

Conference paper

1359 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Abstract

Log area ratio coefficients (LAR) derived from linear prediction coefficients (LPC) is a well known feature extraction technique used in speech applications. This paper presents a novel way to use the LAR feature in a speaker identification system. Here, instead of using the mel frequency cepstral coefficients (MFCC), the LAR feature is used in a Gaussian mixture model (GMM) based speaker identification system. An F-ratio feature analysis was conducted on both the LAR and MFCC feature vectors which showed the lower order LAR coefficients are superior to MFCC counterpart. The text- independent, closed-set speaker identification rate, as tested on the down- sampled version of TIMIT database, was improved from 96.73%, using the MFCC feature, to 98.81%, using the LAR features.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Premakanthan, P., Mikhad, W.B.: Speaker Verification/Recognition and the Importance of Selective Feature Extraction: Review. MWSCAS 1, 57–61 (2001)
Google Scholar
Orman, O.D.: Frequency Analysis of Speaker Identification Performance. Master thesis, Bo aziçi University (2000)
Google Scholar
Sanderson, S.: Automatic Person Verification Using Speech and Face Information. PhD thesis. Griffith University (2002)
Google Scholar
Petry, A., Barone, D.A.C.: Fractal Dimension Applied to Speaker Identification. In: ICASSP (Salt Lake City). May 7-11, pp. 405–408 (2001)
Google Scholar
Liu, C.H., Chen, O.T.C.: A Text-Independent Speaker Identification System Using PARCOR and AR Model. MWSCAS 3, 332–335 (2002)
Google Scholar
Marvin, R.S.: Speaker Recognition Using Orthogonal Linear Prediction. IEEE Transactions on Acoustic, Speech and Signal Processing 24, 283–289 (1976)
Article Google Scholar
Makhoul, J.: Linear Prediction: A Tutorial Review. Proceedings of the IEEE 63, 561–579 (1975)
Article Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Article Google Scholar
Campell Jr., J.P.: Speaker recognition: a tutorial. Speaker recognition: a tutorial 85, 1437–1462 (1997)
Google Scholar
Karpov, E.: Real-Time Speaker Identification. Master thesis, University of Joensuu (2003)
Google Scholar
Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report, University of Berkeley (1998)
Google Scholar
Rabiner, L., Sambur, B.: An Algorithm for Determining the Endpoints of Isolated Utterances. The Bell System Technical Journal 54, 297–315 (1975)
Google Scholar
Linde, Y., Buzo, A., Gray, R.: An Algorithm for Vector Quantizer Design. IEEE Transactions on Communications 28(1), 84–95 (1980)
Article Google Scholar
Paliwal, K.K.: Dimensionality Reduction of the Enhanced Feature Set for the HMMBased Speech Recognizer. Digital Signal Processing 2, 157–173 (1992)
Article Google Scholar
Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The Effects of Telephone Transmission Degradations on Speaker Recognition Performance. In: ICASSP (Detroit). May 9-12, pp. 329–331 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Electronic Engineering Department, The University of Auckland, Auckland, New Zealand
David Chow & Waleed H. Abdulla

Authors

David Chow
View author publications
You can also search for this author in PubMed Google Scholar
Waleed H. Abdulla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Information Technology, Centre for Quantum Computation and Intelligent Systems, and Australian ACS National Committee for Artificial Intelligence, University of Technology, Sydney, Australia
Chengqi Zhang
Department of Computer Science, Auckland University of Technology, 1020, Auckland, New Zealand
Hans W. Guesgen
Artificial Intelligence Technology Centre, Auckland University of Technology, Auckland, New Zealand
Wai-Kiang Yeap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chow, D., Abdulla, W.H. (2004). Speaker Identification Based on Log Area Ratio and Gaussian Mixture Models in Narrow-Band Speech. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_95

Download citation

DOI: https://doi.org/10.1007/978-3-540-28633-2_95
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22817-2
Online ISBN: 978-3-540-28633-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics