Abstract
Recently, expansion of mobile communication arise lots of research interests in robust speaker recognition under multi-channel environments. Thus, building robust automatic speaker recognition (ASR) system becomes an urgent and necessary problem. Though glottal information was successfully used in many speaker recognition systems, the spectral variations caused by it were not taken into account under multi-channel environment. In this paper, a method that can utilize this influence, using both long-term and short-term glottal information, is proposed. Through this recuperation, spectral features will behave more robust in text-independent ASR system under channel influences. Our method was applied to the large multi-channel SRMC corpus. The experimental works show promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Heck, L., et al.: Handset-dependent background models for robust text-independent speaker recognition. In: ICASSP (1987)
Farrell, K.R., Mammone, R.J., Assaleh, K.T.: Speaker Recognition Using Neural Networks and Conventional Classifiers. In: IEEE Trans. on Speech and Audio processing, vol. 2(1), PART II (January 1994)
Sonmez, K., Elizabeth, S., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Proc. Intl. Conf. on Spoken Language Processing, vol. 7, pp. 3189–3192 (1998)
Adami, A., Mihaescu, R., Reynolds, D., Godfrey, J.: Modeling Prosodic Dynamics for speaker Recognition. In: ICASSP 2003, vol. 4, pp. 788–791 (April 2003)
Mizuno, H., et al.: Pitch dependent phone modeling for HMM-based speech recognition. J. Acoust. Soc., Jpn. (E) 15(2), 77–86 (1994)
Campbell Jr., J.P.: A Tutorial. In: Proceeding of the IEEE, vol. 85(9), pp. 1436–1462 (1997)
Dautrich, B.A., Rabiner, L.R., Martin, T.B.: On the effects of varying filter bank parameters on isolated word recognition. IEEE Trans. Acoust., Speech, Signal Processing 31, 793–807 (1983)
Minematsu, N., Nakagawa, S.: Modeling of Variations in Cepstral Coefficients Caused by F0 Changes and Its Application to Speech Processing. In: Proc. Intl. Conf. Spoken Language Processing, pp. 1063–1066 (1998)
Sang, L., Wu, Z., Yang, Y.: Speaker Recognition System in Multi-Channel Environment. In: IEEE International Conference on System, Man & Cybernetics, October 5-8, pp. 3116–3121 (2003)
Sun, X.: A pitch Determination Algorithm Based on Subharmonic-to-harmoic ratio. In: The 6thInternational Conference of Spoken Language Processing, Beijing, China, vol. 4, pp. 676–679 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, P., Yang, Y., Wu, Z. (2004). Glottal Information Based Spectral Recuperation in Multi-channel Speaker Recognition. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_69
Download citation
DOI: https://doi.org/10.1007/978-3-540-30548-4_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)