Abstract
This paper describes the design and implementation of a practical automatic speaker recognition system for the CSLP speaker recognition evaluation (SRE). The speaker recognition system is built upon four subsystems using speaker information from acoustic spectral features. In addition to the conventional spectral features, a novel temporal discrete cosine transform (TDCT) feature is introduced in order to capture long-term speech dynamic. The speaker information is modeled using two complementary speaker modeling techniques, namely, Gaussian mixture model (GMM) and support vector machine (SVM). The resulting subsystems are then integrated at the score level through a multilayer perceptron (MLP) neural network. Evaluation results confirm that the feature selection, classifier design, and fusion strategy are successful, giving rise to an effective speaker recognition system.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Furui, S.: Speaker verification. In: Madisetti, V.K., Williams, D.B. (eds.) Digital Signal Processing Handbook. CRC Press LLC, Boca Raton (1999)
Quatieri, T.F.: Discrete-time speech signal processing: principles and practice. Prentice-Hall, Upper- Sadder River (2002)
Evaluation Plan for ISCSLP 2006 Special Session on Speaker Recognition, Chinese Corpus Consortium (April 2006)
van Leeuwen, D.A., Martin, A.F., Przybocki, M.A., Bouten, J.S.: NIST and NFITNO evaluations of automatic speaker recognition. Computer Speech and Language 20, 128–158 (2006)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing ASSP-28(4) (August 1980)
Kinnunen, T.H., Koh, C.W.E., Wang, L., Li, H., Chng, E.S.: Shifted delta cepstrum amd temporal discrete cosine transform features in speaker verification. Accepted for presentation in International Symposium on Chinese Spoken Language Processing (2006)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Margin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska, D., Reynolds, D.A.: A tutorial on textindepent speaker verification. Eurasip Journal on Applied Signal Processing 4, 430–451 (2004)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 19–41 (2000)
Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)
Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres- Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Computer Speech and Language 20(2-3), 210–229 (2006)
Collobert, R., Bengio, S.: SVMTorch: support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)
Auckenthaler, M.C., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10(1-3), 42–54 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, KA. et al. (2006). The IIR Submission to CSLP 2006 Speaker Recognition Evaluation. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_52
Download citation
DOI: https://doi.org/10.1007/11939993_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)