Abstract
This paper presents an improved speaker verification technique that is especially appropriate for surveillance scenarios. The main idea is a meta-learning scheme aimed at improving fusion of low- and high-level speech information. While some existing systems fuse several classifier outputs, the proposed method uses a selective fusion scheme that takes into account conveying channel, speaking style and speaker stress as estimated on the test utterance. Moreover, we show that simultaneously employing multi-resolution versions of regular classifiers boosts fusion performance. The proposed selective fusion method aided by multi-resolution classifiers decreases error rate by 30% over ordinary fusion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Reynolds, D., Quatieri, T., Dunn, R.: Speaker Verification Using Adapted Gaussian Mixture Models. Digital Signal Processing 10(1), 19–41 (2000)
NIST - Speaker Recognition Evaluations, http://www.nist.gov/speech/tests/spk/index.htm
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear Feature Based Classification of Speech under Stress. IEEE Transactions on Speech & Audio Processing 9(2), 201–216 (2001)
Campbell, J., Reynolds, D., Dunn, R.: Fusing High- and Low-Level Features for Speaker Recognition. In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, pp. 2665–2668 (2003)
Solewicz, Y.A., Koppel, M.: Enhanced Fusion Methods for Speaker Verification. In: 9th International Conference Speech and Computer (SPECOM 2004), St. Petersburg, Russia, pp. 388–392 (2004)
Doddington, G.: Speaker Recognition based on Idiolectal Differences between Speakers. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech), Aalborg, Denmark, pp. 2517–2520 (2001)
Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification Systems. Digital Signal Processing 10, 42–54 (2000)
Andrews, W.D., Kohler, M.A., Campbell, J.P., Godfrey, J.J., Hernández-Cordero, J.: Gender-Dependent Phonetic Refraction for Speaker Recognition. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, Florida, pp. 149–152 (2002)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1999)
Ramaswamy, G., Navratil, J., Chaudhari, U., Zilca, R., Pelecanos, J.: The IBM Systems for the NIST 2003 Speaker Recognition Evaluation. In: NIST 2003 Speaker Recognition Workshop, College Park, Maryland (2003)
Przybocki, M., Martin, A.: The NIST Year 2001 Speaker Recognition Evaluation Plan (2001), http://www.nist.gov/speech/tests/spk/2001/doc/
SWITCHBOARD: A User’s Manual. Linguistic Data Consortium, http://www.ldc.upenn.edu/readme_files/switchboard.readme.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Solewicz, Y.A., Koppel, M. (2005). Selective Fusion for Speaker Verification in Surveillance. In: Kantor, P., et al. Intelligence and Security Informatics. ISI 2005. Lecture Notes in Computer Science, vol 3495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11427995_22
Download citation
DOI: https://doi.org/10.1007/11427995_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25999-2
Online ISBN: 978-3-540-32063-0
eBook Packages: Computer ScienceComputer Science (R0)