Skip to main content

Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs

  • Conference paper
Audio- and Video-Based Biometric Person Authentication (AVBPA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3546))

Abstract

The information of the vocal tract and the glottis are two kinds of sources which can characterize speakers. Though the former one has archived quite good performance in automatic speaker recognition (ASR) tasks, the glottal information behaves poorly when used individually. This work explores how to combining vocal tract and glottal information in an efficient and effective way. Taking into account the short-term correlation between them, our improved joint probability function model of the corresponding features is first proposed. Then we present a novel integrating system which uses parallel Gaussian Mixture Models (GMM) grounded on this function. Together with the traditional GMM, it also forms a hybrid model. Both methods were applied to YOHO and SRMC corpus, and experimental works show promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Atal, B.S.: Automatic recogntion of speakers from their voices. Proc. IEEE 64, 460–475 (1976)

    Article  Google Scholar 

  2. Sonmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 7, pp. 3189–3192 (1998)

    Google Scholar 

  3. Mizuno, H., et al.: Pitch dependent phone modeling for HMM-based speech recognition. J. Acoust. Soc. Jpn(E) 15, 77–84 (1994)

    Google Scholar 

  4. Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J.: Modeling Prosodic Dynamics for Speaker Recognition. In: IEEE ICASSP 2003, vol. 4, pp. 788–791 (2003)

    Google Scholar 

  5. Reynolds Douglas, A.: The effects of handset variability on speaker recognition performance: Experiments on Switchboard corpus. In: IEEE ICASSP 1996, vol. 1, pp. 113–116 (1996)

    Google Scholar 

  6. Shao, X., Milner, B., Cox, S.: Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications. In: Eurospeech 2003, pp. 1725–1728 (2003)

    Google Scholar 

  7. Peskin, B., Navratil, J., Abramson, J., Jones, D., Reynolds, D., Xiang, B.: Using Prosodic and Conversational Features for High-performance Speaker Recognition: Report from JHU WS 2002. In: ICASSP 2003, vol. 4, pp. 792–795 (2003)

    Google Scholar 

  8. Arcienega, M., Drygajlo, A.: Pitch-dependent GMMs for Text-Independent Speaker Recognition Systems. In: Eurospeech 2001, Scandinavia, pp. 2821–2824 (2001)

    Google Scholar 

  9. Ezzaidi, H., Rouat, J., Shaughnessy, D.: Towards combining pitch and MFCC for speaker identification systems. In: Proceedings of Eurospeech, pp. 2825–2828 (2001)

    Google Scholar 

  10. Campbell Jr., J.: Speaker Recognition: A Tutorial. Proceedings of the IEEE 85, 1436–1462 (1997)

    Article  Google Scholar 

  11. Dautrich, B.A., Rabiner, L.R., Martin, T.B.: On the effects of varying filter bank parameters on isolated word recognition. IEEE Trans. Acoust., Speech, Signal Processing. 31, 793–807 (1983)

    Article  Google Scholar 

  12. Jain, K., Ross, A.: Learning User-specific Parameters in a Multibiometric System. In: Proc. Intl. Conf. on Image Processing, pp. 57–60 (2002)

    Google Scholar 

  13. Campbell Jr., J.: Testing with the YOHO CD-ROM Voice Verification Corpus. In: ICASSP 1995, pp. 341–345 (1995)

    Google Scholar 

  14. Sang, L., Wu, Z., Yang, Y.: Speaker Recognition System in Multi-Channel Environment. In: IEEE International Conference on System, Man & Cybernetics, pp. 3116–3121 (2003)

    Google Scholar 

  15. Sun, X.: A Pitch Determination Algorithm Based on Subharmonic-to-harmonic ratio. In: The 6th International Conferernce of Spoken Language Processing, Beijing, China, vol. 4, pp. 676–679 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, P., Yang, Y., Wu, Z. (2005). Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs. In: Kanade, T., Jain, A., Ratha, N.K. (eds) Audio- and Video-Based Biometric Person Authentication. AVBPA 2005. Lecture Notes in Computer Science, vol 3546. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527923_84

Download citation

  • DOI: https://doi.org/10.1007/11527923_84

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27887-0

  • Online ISBN: 978-3-540-31638-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics