Abstract
In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system. In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions. After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch. Then new prosodic features are combined with MFCC parameters. Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus. Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female. We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ezzaidi, H., Rouat, J.: Pitch and MFCC Dependent GMM Models for Speaker Identification Systems. In: IEEE CCECE, pp. 43–46 (2004)
Carey, M.J., Parris, E.S., Lloyd-Thomas, H., Bennett, S., Bunnell, H.T., Idsardi, W.: Robust Prosodic Features for Speaker Identification. In: ICSLP, vol. 3, pp. 1800–1803 (1996)
Sonmez, K., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-based Speaker Recognition. In: EUROSPEECH, pp. 1391–1394 (1997)
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Toward 2003 NIST Speaker Recognition Evaluation: The WCL-1 System. In: Int. Workshop Speech and Computer, pp. 256–261 (2003)
Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J.: Modeling Prosodic Dynamics for Speaker Recognition. In: ICASSP, pp. 788–791 (2003)
Atal, B.S.: Automatic Recognition of Speakers From Their Voices. Proceedings of the IEEE 64, 460–475 (1976)
O’Shaughnessy, D., Tolba, H.: Towards a Robust/Fast Continuous Speech Recognition System Using a Voiced-Unvoiced Decision. In: ICASSP, pp. 413–416 (1999)
Rouat, J., Liu, Y.C., Morissette, D.: A Pitch Determination and Voiced/Unvoiced Decision Algorithm for Noisy Speech. Speech Communication 21, 191–207 (1997)
Droppo, J., Acero, A.: Maximum a Posteriori Pitch Tracking. In: ICSLP, pp. 943–946 (1998)
Wang, C., Seneff, S.: Robust Pitch Tracking for Prosodic Modeling in Telephone Speech. In: ICASSP, pp. 887–890 (2000)
Zicla, R.D., Navratil, J., Ramaswamy, G.N.: Depitch and the Role of Fundamental Frequency in Speaker Recognition. In: ICASSP, pp. 81–84 (2003)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Zheng, R., Zhang, S.W., Xu, B.: Text-independent Speaker Identification Using GMM-UBM and Frame Level Likelihood Normalization. Accepted by ISCSLP 2004 (2004)
[Online] http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf
Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Tech. Rep.TR-97-021, ICSI, U.C.Berkeley, 1–13 (1998)
Wu, M.Y., Wang, D.L., Brown, G.J.: A Multi-Pitch Tracking Algorithm for Noisy Speech. In: ICASSP, vol. 1, pp. 369–372 (2002)
Shao, X., Milner, B., Cox, S.: Integrated Pitch and MFCC Extraction for Speech Recognition and Speech Recognition Applications. In: EUROSPEECH, pp. 1725–1728 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zheng, R., Zhang, S., Xu, B. (2004). Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_65
Download citation
DOI: https://doi.org/10.1007/978-3-540-30548-4_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)