Abstract
The information of the vocal tract and the glottis are two kinds of sources which can characterize speakers. Though the former one has archived quite good performance in automatic speaker recognition (ASR) tasks, the glottal information behaves poorly when used individually. This work explores how to combining vocal tract and glottal information in an efficient and effective way. Taking into account the short-term correlation between them, our improved joint probability function model of the corresponding features is first proposed. Then we present a novel integrating system which uses parallel Gaussian Mixture Models (GMM) grounded on this function. Together with the traditional GMM, it also forms a hybrid model. Both methods were applied to YOHO and SRMC corpus, and experimental works show promising results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atal, B.S.: Automatic recogntion of speakers from their voices. Proc. IEEE 64, 460–475 (1976)
Sonmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 7, pp. 3189–3192 (1998)
Mizuno, H., et al.: Pitch dependent phone modeling for HMM-based speech recognition. J. Acoust. Soc. Jpn(E) 15, 77–84 (1994)
Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J.: Modeling Prosodic Dynamics for Speaker Recognition. In: IEEE ICASSP 2003, vol. 4, pp. 788–791 (2003)
Reynolds Douglas, A.: The effects of handset variability on speaker recognition performance: Experiments on Switchboard corpus. In: IEEE ICASSP 1996, vol. 1, pp. 113–116 (1996)
Shao, X., Milner, B., Cox, S.: Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications. In: Eurospeech 2003, pp. 1725–1728 (2003)
Peskin, B., Navratil, J., Abramson, J., Jones, D., Reynolds, D., Xiang, B.: Using Prosodic and Conversational Features for High-performance Speaker Recognition: Report from JHU WS 2002. In: ICASSP 2003, vol. 4, pp. 792–795 (2003)
Arcienega, M., Drygajlo, A.: Pitch-dependent GMMs for Text-Independent Speaker Recognition Systems. In: Eurospeech 2001, Scandinavia, pp. 2821–2824 (2001)
Ezzaidi, H., Rouat, J., Shaughnessy, D.: Towards combining pitch and MFCC for speaker identification systems. In: Proceedings of Eurospeech, pp. 2825–2828 (2001)
Campbell Jr., J.: Speaker Recognition: A Tutorial. Proceedings of the IEEE 85, 1436–1462 (1997)
Dautrich, B.A., Rabiner, L.R., Martin, T.B.: On the effects of varying filter bank parameters on isolated word recognition. IEEE Trans. Acoust., Speech, Signal Processing. 31, 793–807 (1983)
Jain, K., Ross, A.: Learning User-specific Parameters in a Multibiometric System. In: Proc. Intl. Conf. on Image Processing, pp. 57–60 (2002)
Campbell Jr., J.: Testing with the YOHO CD-ROM Voice Verification Corpus. In: ICASSP 1995, pp. 341–345 (1995)
Sang, L., Wu, Z., Yang, Y.: Speaker Recognition System in Multi-Channel Environment. In: IEEE International Conference on System, Man & Cybernetics, pp. 3116–3121 (2003)
Sun, X.: A Pitch Determination Algorithm Based on Subharmonic-to-harmonic ratio. In: The 6th International Conferernce of Spoken Language Processing, Beijing, China, vol. 4, pp. 676–679 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, P., Yang, Y., Wu, Z. (2005). Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs. In: Kanade, T., Jain, A., Ratha, N.K. (eds) Audio- and Video-Based Biometric Person Authentication. AVBPA 2005. Lecture Notes in Computer Science, vol 3546. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527923_84
Download citation
DOI: https://doi.org/10.1007/11527923_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27887-0
Online ISBN: 978-3-540-31638-1
eBook Packages: Computer ScienceComputer Science (R0)