Skip to main content

Classification Methods for Speaker Recognition

  • Chapter
Speaker Classification I

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

Abstract

Automatic speaker recognition systems have a foundation built on ideas and techniques from the areas of speech science for speaker characterization, pattern recognition and engineering. In this chapter we provide an overview of the features, models, and classifiers derived from these areas that are the basis for modern automatic speaker recognition systems. We describe the components of state-of-the-art automatic speaker recognition systems, discuss application considerations and provide a brief survey of accuracy for different tasks.

This work was sponsored by the Department of Justice under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech, Signal Processing, ASSP 28(4), 357–366 (1980)

    Article  Google Scholar 

  2. Quatieri, T.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice-Hall, Englewood Cliffs (2001)

    Google Scholar 

  3. Reynolds, D.A., Quatieri, T.F., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10(1-3), 19–41 (2000)

    Article  Google Scholar 

  4. Tierney, J.: A study of LPC analysis of speech in additive noise. IEEE Trans. Acoust., Speech, Signal Processing, ASSP 28(4), 389–397 (1980)

    Article  Google Scholar 

  5. Rabiner, L., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  6. Adami, A., Mihaescu, R., Reynolds, D.A., Godfrey, J.J.: Modeling prosodic dynamics for speaker recognition. In: Proc. ICASSP, pp. IV–788–IV–791 (2003)

    Google Scholar 

  7. Peskin, B., Navratil, J., Abramson, J., Jones, D., Klusacek, D., Reynolds, D., Xiang, B.: Using prosodic and conversational features for high-performance speaker recognition: Report from JHU workshop. In: Proc. ICASSP (2003)

    Google Scholar 

  8. Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Proc. Eurospeech, pp. 2521–2524 (2001)

    Google Scholar 

  9. Navrátil, J., Jin, Q., Andrews, W.D., Campbell, J.P.: Phonetic speaker recognition using maximum-likelihood binary-decision tree models. In: Proc. ICASSP, pp. IV–796–IV–799 (2003)

    Google Scholar 

  10. Matsui, T., Furui, S.: Concatenated phoneme models for text-variable speaker recognition. In: Proc. ICASSP, vol. II, pp. 391–394 (1993)

    Google Scholar 

  11. Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, MIT Press, Cambridge (2004)

    Google Scholar 

  12. Andrews, W.D., Kohler, M.A., Campbell, J.P., Godfrey, J.J., Hernandez-Cordero, J.: Gender-dependent phonetic refraction for speaker recognition. In: Proc. ICASSP, pp. I149–I153 (2002)

    Google Scholar 

  13. Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Meignier, S., Merlin, T., Ortega-Garc, J., Magrin-Chagnolleau, I., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verication. EURASIP Journal on Applied Signal Processing 4, 430–451 (2004)

    Article  Google Scholar 

  14. Reynolds, D.A.: Speaker identification and verification using gaussian mixture speaker models. Speech Commun. 17(1-2), 91–108 (1995)

    Article  Google Scholar 

  15. Carey, M., Parris, E., Bridle, J.: A speaker verification system using alpha-nets. In: Proc. ICASSP (1991)

    Google Scholar 

  16. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  17. Gauvain, J.-L., Lee, C.-H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains. IEEE Trans. Speech and Audio Processing 2(2), 291–298 (1994)

    Article  Google Scholar 

  18. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley and Sons, New York (1973)

    MATH  Google Scholar 

  19. Soong, F., Rosenberg, A., Rabiner, L., Juang, B.: A vector quantization approach to speaker recognition. In: Proc. ICASSP, pp. 387–390 (1985)

    Google Scholar 

  20. Rosenberg, A., Soong, F.: Evaluation of a vector quantization talker recognition system in text independent and text dependent modes. In: Proc. ICASSP, pp. 873–876 (1986)

    Google Scholar 

  21. Campbell, W.M.: Generalized linear discriminant sequence kernels for speaker recognition. In: Proc. ICASSP, pp. 161–164 (2002)

    Google Scholar 

  22. Fine, S., Navrátil, J., Gopinath, R.A.: A hybrid GMM/SVM approach to speaker recognition. In: Proc. ICASSP (2001)

    Google Scholar 

  23. Wan, V., Renals, S.: SVMSVM: support vector machine speaker verification methodology. In: Proc. ICASSP, pp. 221–224 (2003)

    Google Scholar 

  24. Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: High-level speaker verification with support vector machines. In: Proc. ICASSP, pp. I–73–76 (2004)

    Google Scholar 

  25. Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A.: MLLR transforms as features in speaker recognition. In: Proc. Interspeech, pp. 2425–2428 (2005)

    Google Scholar 

  26. Campbell, W.M., Sturim, D.E., Reynolds, D.A., Solomonoff, A.: SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: Proc. ICASSP, pp. I–97–I–100 (2006)

    Google Scholar 

  27. Cristianini, N., Shawe-Taylor, J.: Support Vector Machines. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  28. Collobert, R., Bengio, S.: SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research 1, 143–160 (2001)

    Article  MathSciNet  Google Scholar 

  29. Louradour, J., Daoudi, K., Bach, F.: SVM speaker verification using an incomplete cholesky decomposition sequence kernel. In: IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)

    Google Scholar 

  30. Mariéthoz, J., Bengio, S.: A max kernel for text-independent speaker verification systems. In: Second Workshop on Multimodal User Authentication (2006)

    Google Scholar 

  31. Soong, F.K., Rosenberg, A.E.: On the use of instantaneous and transitional spectral information in speaker recognition. In: Proc. ICASSP, pp. 877–880 (1986)

    Google Scholar 

  32. Matsui, T., Furui, S.: Speaker recognition using concatenated phoneme models. In: Proc. ICSLP (1992)

    Google Scholar 

  33. Rosenberg, A.E., Parthasarathy, S.: Speaker background models for connected digit password speaker verification. In: Proc. ICASSP, pp. 81–84 (1996)

    Google Scholar 

  34. Corrada-Emmanuel, A., Newman, M., Peskin, B., Gillick, L., Roth, R.: Progress in speaker recognition at dragon systems. In: Proc. ICSLP (1998)

    Google Scholar 

  35. Weber, F., Peskin, B., Newman, M., Corrada-Emmanuel, A., Gillick, L.: Speaker recognition on single- and multispeaker data. Digital Signal Processing 10, 75–92 (2000)

    Article  Google Scholar 

  36. Rabiner, L.R., Juang, B.H.: An introduction to hidden markov models. IEEE ASSP Mag. 3 (1986)

    Google Scholar 

  37. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE 77(2), 257–285 (1989)

    Article  Google Scholar 

  38. Campbell, J.P.: Speaker recognition: A tutorial. Proc. of the IEEE 85(9), 1437–1462 (1997)

    Article  Google Scholar 

  39. Newman, M., Gillick, L., Ito, Y., McAllaster, D., Peskin, B.: Speaker verification through large vocabulary continuous speechrecognition. In: Proc. ICSLP (1996)

    Google Scholar 

  40. Matsui, T., Furui, S.: Likelihood normalization for speaker verification using phoneme- and speaker-independent model. In: Speech Communication (1995)

    Google Scholar 

  41. Farrell, K.R., Mammone, R.J., Assaleh, K.T.: Speaker recognition using neural networks and conventional classifiers. IEEE Trans. on Speech and Audio Processing 2(1), 194–205 (1994)

    Article  Google Scholar 

  42. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

  43. Oglesby, J., Mason, J.: Radial basis function networks for speaker recognition. In: Proc. ICASSP, pp. 393–396 (May 1991)

    Google Scholar 

  44. Hermansky, H., Morgan, N., Bayya, A., Kohn, P.: Compensation for the effect of communication channel in auditory-like analysis of speech (RASTA-PLP). In: Proc. Eurospeech, pp. 1367–1371 (1991)

    Google Scholar 

  45. Atal, B.: Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. Journal of the Acoustical Society of America 55(6), 1304–1312 (1974)

    Article  Google Scholar 

  46. Mansour, D., Juang, B.: A family of distortion measures based upon projection operation for robust speech recognition. IEEE Trans. Acoust., Speech, Signal Processing, ASSP 37, 1659–1671 (1989)

    Article  Google Scholar 

  47. Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: Proc. of Speaker Odyssey Workshop, pp. 213–218 (2001)

    Google Scholar 

  48. Reynolds, D.A.: Channel robust speaker verification via feature mapping. In: Proc. ICASSP, vol. 2, pp. II–53–56 (2003)

    Google Scholar 

  49. Teunen, R., Shahshahani, B., Heck, L.: A model-based transformational approach to robust speaker recognition. In: Proc. ICSLP (2000)

    Google Scholar 

  50. Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech and Audio Processing 13(3), 345–354 (2005)

    Article  Google Scholar 

  51. Vogt, R., Baker, B., Sriharan, S.: Modelling session variability in text-independent speaker verification. In: Proc. Interspeech, pp. 3117–3120 (2005)

    Google Scholar 

  52. Solomonoff, A., Campbell, W.M., Boardman, I.: Advances in channel compensation for SVM speaker recognition. In: Proc. ICASSP (2005)

    Google Scholar 

  53. Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10, 42–54 (2000)

    Article  Google Scholar 

  54. Reynolds, D.A.: Comparison of background normalization methods for text independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)

    Google Scholar 

  55. Heck, L., Weintraub, M.: Handset-dependent background models for robust text-independent speaker recognition. In: Proc. ICASSP, pp. 1071–1074 (1997)

    Google Scholar 

  56. Campbell, W.M., Navratil, J., Reynolds, D.A., Shen, W., Sturim, D.E.: The MIT/IBM 2006 speaker recognition system:High-performance reduced complexity recognition. In: ICASSP (2007)

    Google Scholar 

  57. Reynolds, D.A., Campbell, W., Gleason, T., Quillen, C., Sturim, D., Torres-Carrasquillo, P., Adam, A.: The 2004 MIT Lincoln Laboratory speaker recognition system. In: ICASSP (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sturim, D.E., Campbell, W.M., Reynolds, D.A. (2007). Classification Methods for Speaker Recognition. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74200-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74186-2

  • Online ISBN: 978-3-540-74200-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics