Abstract
This paper describes our recent efforts in exploring effective discriminative features for speaker recognition. Recent researches have indicated that the appropriate fusion of features is critical to improve the performance of speaker recognition system. In this paper we describe our approaches for the NIST 2006 Speaker Recognition Evaluation. Our system integrated the cepstral GMM modeling, cepstral SVM modeling and tokenization at both phone level and frame level. The experimental results on both NIST 2005 SRE corpus and NIST 2006 SRE corpus are presented. The fused system achieved 8.14% equal error rate on 1conv4w-1conv4w test condition of the NIST 2006 SRE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Modeling. Digital Signal Processing 10, 19–41 (2000)
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support Vector Machines for Speaker and Language Recognition. Computer Speech and Language 20, 210–229 (2006)
Doddington, G.: Speaker Recognition based on Idiolectal Differences between Speakers. In: Proc. Eurospeech (2001)
Zissman, M.A.: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans. on Speech and Audio Processing 4(1) (1996)
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller Jr., J.R.: Language Identification using Gaussian Mixture Model Tokenization. In: Proc. ICASSP (2002)
Ma, B., Zhu, D., Tong, R., Li, H.: Speaker cluster based GMM tokenization for speaker recognition. To appear in Interspeech (2006)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for textindependent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Kinnunen, T.H., Koh, C.W.E., Wang, L., Li, H., Chng, E.S.: Temporal Discrete Cosine Transform: Towards Longer Term Temporal Features for Speaker Verification. Accepted for presentation in 5th International Symposium on Chinese Spoken Language Processing (2006)
Li, H., Ma, B.: A Phonotactic Language Model for Spoken Language Identification. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, USA (June 2005)
http://www.nist.gov/speech/tests/spk/2006/sre-06_evalplan-v9.pdf
Campbell, W.M.: Generalized linear discrininant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)
Collobert, R., Bengio, S.: SVMTorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Hermansky, H.: Exploring temporal domain for robustness in speech recognition,invited paper. In: Proceedings of the 15th International Congress on Acoustics, vol. 3, pp. 61–64 (1995)
Language Identification Corpus of the Institute for Infocomm Research
Wang, H.-C.: MAT-a project to collect Mandarin speech data through networks in Taiwan. Int. J. Comput. Linguistics Chinese Language Process 1(2), 73–89 (1997)
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Proc. NIPS (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tong, R. et al. (2006). Fusion of Acoustic and Tokenization Features for Speaker Recognition. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_59
Download citation
DOI: https://doi.org/10.1007/11939993_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)