Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features

Zheng, Rong; Zhang, Shuwu; Xu, Bo

doi:10.1007/978-3-540-30548-4_65

Rong Zheng²¹,
Shuwu Zhang²¹ &
Bo Xu²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3338))

Included in the following conference series:

Chinese Conference on Biometric Recognition

2238 Accesses
1 Citations

Abstract

In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system. In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions. After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch. Then new prosodic features are combined with MFCC parameters. Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus. Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female. We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ezzaidi, H., Rouat, J.: Pitch and MFCC Dependent GMM Models for Speaker Identification Systems. In: IEEE CCECE, pp. 43–46 (2004)
Google Scholar
Carey, M.J., Parris, E.S., Lloyd-Thomas, H., Bennett, S., Bunnell, H.T., Idsardi, W.: Robust Prosodic Features for Speaker Identification. In: ICSLP, vol. 3, pp. 1800–1803 (1996)
Google Scholar
Sonmez, K., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-based Speaker Recognition. In: EUROSPEECH, pp. 1391–1394 (1997)
Google Scholar
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Toward 2003 NIST Speaker Recognition Evaluation: The WCL-1 System. In: Int. Workshop Speech and Computer, pp. 256–261 (2003)
Google Scholar
Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J.: Modeling Prosodic Dynamics for Speaker Recognition. In: ICASSP, pp. 788–791 (2003)
Google Scholar
Atal, B.S.: Automatic Recognition of Speakers From Their Voices. Proceedings of the IEEE 64, 460–475 (1976)
Article Google Scholar
O’Shaughnessy, D., Tolba, H.: Towards a Robust/Fast Continuous Speech Recognition System Using a Voiced-Unvoiced Decision. In: ICASSP, pp. 413–416 (1999)
Google Scholar
Rouat, J., Liu, Y.C., Morissette, D.: A Pitch Determination and Voiced/Unvoiced Decision Algorithm for Noisy Speech. Speech Communication 21, 191–207 (1997)
Article Google Scholar
Droppo, J., Acero, A.: Maximum a Posteriori Pitch Tracking. In: ICSLP, pp. 943–946 (1998)
Google Scholar
Wang, C., Seneff, S.: Robust Pitch Tracking for Prosodic Modeling in Telephone Speech. In: ICASSP, pp. 887–890 (2000)
Google Scholar
Zicla, R.D., Navratil, J., Ramaswamy, G.N.: Depitch and the Role of Fundamental Frequency in Speaker Recognition. In: ICASSP, pp. 81–84 (2003)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Zheng, R., Zhang, S.W., Xu, B.: Text-independent Speaker Identification Using GMM-UBM and Frame Level Likelihood Normalization. Accepted by ISCSLP 2004 (2004)
Google Scholar
[Online] http://www.nist.gov/speech/tests/spk/2001/doc/2001-spkrec-evalplan-v05.9.pdf
Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Tech. Rep.TR-97-021, ICSI, U.C.Berkeley, 1–13 (1998)
Google Scholar
Wu, M.Y., Wang, D.L., Brown, G.J.: A Multi-Pitch Tracking Algorithm for Noisy Speech. In: ICASSP, vol. 1, pp. 369–372 (2002)
Google Scholar
Shao, X., Milner, B., Cox, S.: Integrated Pitch and MFCC Extraction for Speech Recognition and Speech Recognition Applications. In: EUROSPEECH, pp. 1725–1728 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China
Rong Zheng, Shuwu Zhang & Bo Xu

Authors

Rong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Shuwu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,
Stan Z. Li
Department of Electronics & Communication Engineering, Sun Yat-Sen University, Guangzhou, China
Jianhuang Lai
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Center of Computer Vision, School of Mathematics and Computing Science, Sun Yat-sen University, 510275, Guangzhou, China
Guocan Feng
School of Computer Science and Engineering, Beihang University, Beijing, China
Yunhong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, R., Zhang, S., Xu, B. (2004). Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_65

Download citation

DOI: https://doi.org/10.1007/978-3-540-30548-4_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics