Abstract
A text prompted speaker verification system is presented in this paper. This system is based on ten Chinese digits. Basic acoustic models are speaker dependent and content dependent phoneme HMMs which were generated by adapting speaker independent models to the utterances of specific speakers. An obvious constraint for normalization techniques used in TDSV is that the phrases with the same content should be used for competitive cohort models. So many of the score normalization techniques are either difficult to implement because of lack of data or not good for performance improvement because of poor estimation of the normalization parameters. We propose a method which combines the traditional T-Norm and Cohort Norm together to find a good tradeoff of testing utterance normalization and target speaker model normalization. The proposed method improved the system performance from the baseline equal error rate 3.42% for T-Norm and 2.72% for Cohort Norm to 2.50%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Che, C.W., Lin, Q., Yuk, D.S.: An HMM Approach to Text-prompted Speaker Verification. In: Proc. ICASSP vol. 2, pp. 673–676 (1996)
Kato, T., Shimizu, T.: Improved Speaker Verification Over the Cellular Phone Network Using Phoneme-Balanced and Digit-Sequence-Preserving Connected Digit Patterns. In: Proc. ICASSP vol. 2, II pp. 57–60 (2003)
Melin, H., Lindberg, J.: Prompting of Passwords in Speaker Verification System. KTH, Dept. of Speech, Music and Hearing
Li, K.P., Porter, J.E.: Normalizations and Selection of Speech Segments for Speaker Recognition Scoring. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, New York, NY, USA, vol. 1, pp. 595–598 (April 1988)
Reynolds, D.A.: The Effect of Handset Variability on Speaker Recognition Performance: Experiments on Switchboard Corpus. In: Proc. ICASSP vol. 1, pp. 113–116 (1996)
Hebert, M., Boies, D., Communication, N.: T-Norm for Text-Dependent Commercial Speaker Verification Applications: Effect of Lexical Mismatch. In: Proc. ICASSP, vol. 1, pp. 729–732, 113-116
Matsui, T., Furui, S.: Concatenated Phoneme Models for Text Variable Speaker Recognition. In: Proc. ICASSP, vol. 2, pp. 391–394 (1993)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification System. On Digital Signal Processing 10, 42–54 (2000)
Sturim, D.E., Reynolds, D.A.: Speaker Adaptive Cohort Selection for Tnorm in Text-Independent Speaker Verification. In: Proc. ICASSP, vol. 1, pp. 741–744 (2005)
Colombi, J.M., Ruck, D.W., Anderson, T.R., Rogers, S.K., Oxley, M.: Cohort Selection and Word Grammar Effects for Speaker Recognition. In: Proc. ICASSP, vol. 1, pp. 85–88 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, J., Dong, Y., Dong, C., Wang, H. (2007). Score Normalization Technique for Text-Prompted Speaker Verification with Chinese Digits. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. ICIC 2007. Lecture Notes in Computer Science(), vol 4682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74205-0_112
Download citation
DOI: https://doi.org/10.1007/978-3-540-74205-0_112
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74201-2
Online ISBN: 978-3-540-74205-0
eBook Packages: Computer ScienceComputer Science (R0)