Fusion of Acoustic and Tokenization Features for Speaker Recognition

Tong, Rong; Ma, Bin; Lee, Kong-Aik; You, Changhuai; Zhu, Donglai; Kinnunen, Tomi; Sun, Hanwu; Dong, Minghui; Chng, Eng-Siong; Li, Haizhou

doi:10.1007/11939993_59

Rong Tong^22,23,
Bin Ma²²,
Kong-Aik Lee²²,
Changhuai You²²,
Donglai Zhu²²,
Tomi Kinnunen²²,
Hanwu Sun²²,
Minghui Dong²²,
Eng-Siong Chng²³ &
…
Haizhou Li^22,23

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4274))

Included in the following conference series:

International Symposium on Chinese Spoken Language Processing

1586 Accesses
5 Citations

Abstract

This paper describes our recent efforts in exploring effective discriminative features for speaker recognition. Recent researches have indicated that the appropriate fusion of features is critical to improve the performance of speaker recognition system. In this paper we describe our approaches for the NIST 2006 Speaker Recognition Evaluation. Our system integrated the cepstral GMM modeling, cepstral SVM modeling and tokenization at both phone level and frame level. The experimental results on both NIST 2005 SRE corpus and NIST 2006 SRE corpus are presented. The fused system achieved 8.14% equal error rate on 1conv4w-1conv4w test condition of the NIST 2006 SRE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification Using Adapted Gaussian Mixture Modeling. Digital Signal Processing 10, 19–41 (2000)
Article Google Scholar
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support Vector Machines for Speaker and Language Recognition. Computer Speech and Language 20, 210–229 (2006)
Article Google Scholar
Doddington, G.: Speaker Recognition based on Idiolectal Differences between Speakers. In: Proc. Eurospeech (2001)
Google Scholar
Zissman, M.A.: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Trans. on Speech and Audio Processing 4(1) (1996)
Google Scholar
Torres-Carrasquillo, P.A., Reynolds, D.A., Deller Jr., J.R.: Language Identification using Gaussian Mixture Model Tokenization. In: Proc. ICASSP (2002)
Google Scholar
Ma, B., Zhu, D., Tong, R., Li, H.: Speaker cluster based GMM tokenization for speaker recognition. To appear in Interspeech (2006)
Google Scholar
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for textindependent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Article Google Scholar
Kinnunen, T.H., Koh, C.W.E., Wang, L., Li, H., Chng, E.S.: Temporal Discrete Cosine Transform: Towards Longer Term Temporal Features for Speaker Verification. Accepted for presentation in 5th International Symposium on Chinese Spoken Language Processing (2006)
Google Scholar
Li, H., Ma, B.: A Phonotactic Language Model for Spoken Language Identification. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor, USA (June 2005)
Google Scholar
http://www.nist.gov/speech/tests/spk/2006/sre-06_evalplan-v9.pdf
Campbell, W.M.: Generalized linear discrininant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)
Google Scholar
Collobert, R., Bengio, S.: SVMTorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Article MathSciNet Google Scholar
Hermansky, H.: Exploring temporal domain for robustness in speech recognition,invited paper. In: Proceedings of the 15th International Congress on Acoustics, vol. 3, pp. 61–64 (1995)
Google Scholar
Language Identification Corpus of the Institute for Infocomm Research
Google Scholar
Wang, H.-C.: MAT-a project to collect Mandarin speech data through networks in Taiwan. Int. J. Comput. Linguistics Chinese Language Process 1(2), 73–89 (1997)
Google Scholar
http://cslu.cse.ogi.edu/corpora/corpCurrent.html
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Proc. NIPS (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Rong Tong, Bin Ma, Kong-Aik Lee, Changhuai You, Donglai Zhu, Tomi Kinnunen, Hanwu Sun, Minghui Dong & Haizhou Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Rong Tong, Eng-Siong Chng & Haizhou Li

Authors

Rong Tong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Kong-Aik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Changhuai You
View author publications
You can also search for this author in PubMed Google Scholar
Donglai Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Tomi Kinnunen
View author publications
You can also search for this author in PubMed Google Scholar
Hanwu Sun
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Dong
View author publications
You can also search for this author in PubMed Google Scholar
Eng-Siong Chng
View author publications
You can also search for this author in PubMed Google Scholar
Haizhou Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Hong Kong, Hong Kong
Qiang Huo
Human Language Technology Department, Institute for Infocomm Research (I2R), 119613, Singapore
Bin Ma
School of Computer Engineering, Nanyang Technological University (NTU), 639798, Singapore
Eng-Siong Chng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Haizhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tong, R. et al. (2006). Fusion of Acoustic and Tokenization Features for Speaker Recognition. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_59

Download citation

DOI: https://doi.org/10.1007/11939993_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics