Skip to main content

Higher-Level Features in Speaker Recognition

  • Chapter
Speaker Classification I

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4343))

Abstract

Higher-level features based on linguistic or long-range information have attracted significant attention in automatic speaker recognition. This article briefly summarizes approaches to using higher-level features for text-independent speaker verification over the last decade. To clarify how each approach uses higher-level information, features are described in terms of their type, temporal span, and reliance on automatic speech recognition for both feature extraction and feature conditioning. A subsequent analysis of higher-level features in a state-of-the-art system illustrates that (1) a higher-level cepstral system outperforms standard systems, (2) a prosodic system shows excellent performance individually and in combination, (3) other higher-level systems provide further gains, and (4) higher-level systems provide increasing relative gains as training data increases. Implications for the general field of speaker classification are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digital Signal Processing 10, 181–202 (2000)

    Article  Google Scholar 

  2. Sturim, D.E., Campbell, W.M., Reynolds, D.A.: Classification Methods for Speaker Recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS (LNAI), vol. 4343, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  3. Markowitz, J.: The Many Roles of Speaker Classification in Speaker Verification and Identification. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Martin, A.F.: Evaluations of Automatic Speaker Classification Systems. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Carey, M., Parris, E., Lloyd-Thomas, H., Bennett, S.: Robust prosodic features for speaker identification. In: Bunnell, H.T., Idsardi, W. (eds.) Proc. ICSLP. Philadelphia, vol. 3, pp. 1800–1803 (1996)

    Google Scholar 

  6. Sönmez, M.K., Heck, L., Weintraub, M., Shriberg, E.: A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) Proc. EUROSPEECH, Rhodes, Greece, pp. 1391–1394 (1997)

    Google Scholar 

  7. Arcienega, M., Drygajlo, A.: Pitch-Dependent GMMs for Text-Independent Speaker Recognition Systems. In: Eurospeech 2001 – Interspeech. Proceedings of the 7th European Conference on Speech Communication and Technology, Aalborg, Denmark, pp. 2821–2825 (2001)

    Google Scholar 

  8. Kinnunen, T., Gonzalez-Hautamaki, R.: Long-Term F0 Modeling for Text-Independent Speaker Recognition. In: SPECOM. Proceedings of the 10th International Conference Speech and Computer, Patras, Greece, pp. 567–570 (2005)

    Google Scholar 

  9. Park, A., Hazen, T.J.: ASR Dependent Techniques for Speaker Identification. In: Hansen, J.H.L., Pellom, B. (eds.) Proc. ICSLP, Denver, pp. 1337–1340 (2002)

    Google Scholar 

  10. Sturim, D.E., Reynolds, D.A., Dunn, R.B., Quatieri, T.F.: Speaker Verification Using Text-Constrained Gaussian Mixture Models. In: Proc. ICASSP. vol. 1, Orlando, pp. 677–680 (2002)

    Google Scholar 

  11. Baker, B., Vogt, R., Sridharan, S.: Gaussian Mixture Modelling of Broad Phonetic and Syllabic Events for Text-Independent Speaker Verification. In: Eurospeech 2005 – Interspeech. Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, pp. 2429–2432 (2005)

    Google Scholar 

  12. Gauvain, J.L., Lamel, L.F., Prouts, B.: Experiments with Speaker Verification Over the Telephone. In: Pardo, J.M., Enríquez, E., Ortega, J., Ferreiros, J., Macías, J., Valverde, F.J. (eds.) Proc. EUROSPEECH, Madrid (1995)

    Google Scholar 

  13. Newman, M., Gillick, L., Ito, Y., McAllaster, D., Peskin, B.: Speaker Verification Through Large Vocabulary Continuous Speech Recognition. In: Bunnell, H.T., Idsardi, W. (eds.) Proc. ICSLP. vol. 4, Philadelphia, pp. 2419–2422 (1996)

    Google Scholar 

  14. Boakye, K., Peskin, B.: Text-Constrained Speaker Recognition on a Text-Independent Task. In: Proceedings Odyssey-04 Speaker and Language Recognition Workshop, Toledo, Spain (2004)

    Google Scholar 

  15. Gillick, D., Stafford, S., Peskin, B.: Speaker Detection without Models. In: Proc. ICASSP. Philadelphia, vol. 1, pp. 757–760 (2005)

    Google Scholar 

  16. Aronowitz, H., Burshtein, D., Amir, A.: Text Independent Speaker Recognition Using Speaker Dependent Word Spotting. In: ICSLP 2004. Proceedings of the International Conference of Spoken Language Processing, Jeju Island, South Korea, pp. 1789–1792 (2004)

    Google Scholar 

  17. Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A.: MLLR: Transforms as Features in Speaker Recognition. In: Proc. Interspeech, Lisbon, pp. 2425–2428 (2005)

    Google Scholar 

  18. Andrews, W.D., Kohler, M.A., Campbell, J.P., Godfrey, J.J., Hernandez-Cordero, J.: Gender-Dependent Phonetic Refraction for Speaker Recognition. In: Proc. ICASSP. Orlando, vol. 1, pp. 149–152 (2002)

    Google Scholar 

  19. Campbell, W.M., Campbell, J.P., Reynolds, D.A., Jones, D.A., Leek, T.R.: Phonetic Speaker Recognition with Support Vector Machines. Advances in Neural Information Processing Systems 16, 1377–1384 (2004)

    Google Scholar 

  20. Hatch, A.O., Peskin, B., Stolcke, A.: Improved Phonetic Speaker Recognition Using Lattice Decoding. In: Proc. ICASSP. Philadelphia, vol. 1, pp. 169–172 (2005)

    Google Scholar 

  21. Navrátil, J., Jin, Q., Andrews, W.D., Campbell, J.P.: Phonetic Speaker Recognition Using Maximum-Likelihood Binary-Decision Tree Models. In: Proc. ICASSP. Hong Kong, vol. 4, pp. 796–799 (2003)

    Google Scholar 

  22. Jin, Q., Navrátil, J., Reynolds, D.A., Campbell, J.P., Andrews, W.D., Abramson, J.S.: Combining Cross-Stream and Time Dimension in Phonetic Speaker Recognition. In: Proc. ICASSP. Hong Kong, vol. 4, pp. 800–803 (2003)

    Google Scholar 

  23. Lei, H., Mirghafori, N.: Word-Conditioned Phone N-Grams for Speaker Recognition. In: Proc. ICASSP, Honolulu (2007)

    Google Scholar 

  24. Klusáček, D., Navrátil, J., Reynolds, D.A., Campbell, J.P.: Conditional Pronunciation Modeling in Speaker Detection. In: Proc. ICASSP. Hong Kong, vol. 4, pp. 804–807 (2003)

    Google Scholar 

  25. Ka-Leung, Y., Man-Mak, W., Kung, S.Y.K.: Articulatory Feature-Based Conditional Pronunciation Modeling for Speaker Verification. In: ICSLP 2004. Proceedings of the International Conference of Spoken Language Processing, Jeju Island, South Korea, pp. 2597–2600 (2004)

    Google Scholar 

  26. Sönmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling Dynamic Prosodic Variation for Speaker Verification. In: Mannell, R.H., Robert-Ribes, J. (eds.) Proc. ICSLP. vol. 7, pp. 3189–3192, Australian Speech Science and Technology Association, Sydney (1998)

    Google Scholar 

  27. Adami, A.G., Mihaescu, R., Reynolds, D.A., Godfrey, J.J.: Modeling Prosodic Dynamics for Speaker Recognition. In: Proc. ICASSP. Hong Kong, vol. 4, pp. 788–791 (2003)

    Google Scholar 

  28. Kajarekar, S., Ferrer, L., Sönmez, K., Zheng, J., Shriberg, E., Stolcke, A.: Modeling NERFs for Speaker Recognition. In: Proceedings Odyssey-04 Speaker and Language Recognition Workshop, Toledo, Spain, pp. 51–56 (2004)

    Google Scholar 

  29. Peskin, B., Navrátil, J., Abramson, J., Jones, D., Klusáček, D., Reynolds, D.A., Xiang, B.: Using Prosodic And Conversational Features for High Performance Speaker Recognition: Report From JHU WS’02. In: ICASSP 2003. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, pp. 792–795 (2003)

    Google Scholar 

  30. Ferrer, L., Bratt, H., Gadde, V.R.R., Kajarekar, S., Shriberg, E., Sonmez, K., Stolcke, A., Venkataraman, A.: Modeling Duration Patterns for Speaker Recognition. In: Proc. EUROSPEECH, Geneva, pp. 2017–2020 (2003)

    Google Scholar 

  31. Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., Stolcke, A.: Modeling prosodic feature sequences for speaker recognition. Speech Communication, Special Issue on Quantitative Prosody Modelling for Natural Speech Description and Generation 46(3-4), 455–472 (2005)

    Google Scholar 

  32. Ferrer, L., Shriberg, E., Kajarekar, S., Sönmez, K.: Parameterization of Prosodic Feature Distributions for SVM Modeling in Speaker Recognition. In: ICASSP 2007. Proceedings of the 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawaii (2007)

    Google Scholar 

  33. Shriberg, E., Ferrer, L.: A Text-Constrained Prosodic System for Speaker Verification. In: Proceedings of Interspeech, Antwerp, Belgium (2007)

    Google Scholar 

  34. Doddington, G.: Speaker Recognition Based on Idiolectal Differences Between Speakers. In: Dalsgaard, P., Lindberg, B., Benner, H., Tan, Z. (eds.) Proc. EUROSPEECH, Aalborg, Denmark, pp. 2521–2524 (2001)

    Google Scholar 

  35. Kajarekar, S.S., Ferrer, L., Shriberg, E., Sonmez, K., Stolcke, A., Venkataraman, A., Zheng, J.: SRI’s 2004, NIST Speaker Recognition Evaluation System. In: Proc. ICASSP. Philadelphia, vol. 1, pp. 173–176 (2005)

    Google Scholar 

  36. Tür, G., Shriberg, E., Stolcke, A., Kajarekar, S.: Duration and Pronunciation Conditioned Lexical Modeling for Speaker Verification. In: Proceedings of Interspeech, Antwerp, Belgium (2007)

    Google Scholar 

  37. Scheffer, N., Bonastre, J.F.: Speaker Detection using Acoustic Event Sequences. In: Eurospeech 2005 – Interspeech. Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal (2005)

    Google Scholar 

  38. Reynolds, D., Andrews, W., Campbell, J., Navrátil, J., Peskin, B., Adami, A., Jin, Q., Klusáček, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., Xiang, B.: The SuperSID Project: Exploiting High-level Information for High-accuracy Speaker Recognition. In: ICASSP 2003. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong (2003)

    Google Scholar 

  39. Titze, I.: Principles of Voice Production. Prentice Hall, Englewood Cliffs (1994)

    Google Scholar 

  40. Atal, B.: Automatic Speaker Recognition Based on Pitch Contours. Journal of the Acoustical Society of America 52(6), 1687–1697 (1972)

    Article  Google Scholar 

  41. Chen, S.H., Wang, H.C.: Improvement of Speaker Recognition by Combining Residual and Prosodic Features with Acoustic Features. In: ICASSP. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada (2004)

    Google Scholar 

  42. Chen, J., Dai, B., Sun, J.: Prosodic Features Based on Wavelet Analysis for Speaker Verification. In: Eurospeech 2005 – Interspeech. Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, pp. 3093–3096 (2005)

    Google Scholar 

  43. Chen, Z.H., Liao, Y.F.L., Juang, Y.T.: Eigen-Prosody Analysis for Robust Speaker Recognition under Mismatch Handset Environment. In: ICSLP 2004. Proceedings of the International Conference of Spoken Language Processing, Jeju Island, South Korea (2004)

    Google Scholar 

  44. Weber, F., Manganaro, L., Peskin, B., Shriberg, E.: Using Prosodic and Lexical Information for Speaker Identification. In: ICASSP 2002. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, Florida (2002)

    Google Scholar 

  45. Heck, L.: Integrating High-Level Information for Robust Speaker Recognition (2002), http://www.clsp.jhu.edu/ws2002/groups/supersid/

  46. Nayeeemulla Khan, A., Yegnanarayanaa, B.: Latent Semantic Analysis for Speaker Recognition. In: ICSLP 2004. Proceedings of the International Conference of Spoken Language Processing, Jeju Island, South Korea (2004)

    Google Scholar 

  47. Martin, A., Miller, D., Przybocki, M., Campbell, J., Nakasone, H.: Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004. In: Proceedings 4th International Conference on Language Resources and Evaluation, Lisbon, pp. 587–590 (2004)

    Google Scholar 

  48. Stolcke, A., Franco, H., Gadde, R., Graciarena, M., Precoda, K., Venkataraman, A., Vergyri, D., Wang, W., Zheng, J., Huang, Y., Peskin, B., Bulyko, I., Ostendorf, M., Kirchhoff, K.: Speech-to-text Research at SRI-ICSI-UW. In: DARPA RT-03 Workshop, Boston (2003)

    Google Scholar 

  49. Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Factor Analysis Simplified. In: Proc. ICASSP. vol. 1, pp. 637–640 (2005)

    Google Scholar 

  50. Solomonoff, A., Campbell, W.M., Boardman, I.: Advances in Channel Compensation for SVM Speaker Recognition. In: Proc. ICASSP, Philadelphia, vol. 1, pp. 629–632 (2005)

    Google Scholar 

  51. Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score Normalization for Text-Independent Speaker Verification Systems. Digital Signal Processing 10(1-3), 42–54 (2000)

    Article  Google Scholar 

  52. Campbell, W.M.: Generalized Linear Discriminant Sequence Kernels for Speaker Recognition. In: Proc. ICASSP, Orlando, vol. 1, pp. 161–164 (2002)

    Google Scholar 

  53. Campbell, W.M., Sturim, D.E., Reynolds, D.A.: Support Vector Machines Using GMM Supervectors for Speaker Verification. IEEE Signal Processing Letters 13(5), 308–311 (2006)

    Article  Google Scholar 

  54. Schötz, S., Müller, C.: A Study of Acoustic Correlates of Speaker Age. In: Müller, C. (ed.) Speaker Classification II. LNCS(LNAI), vol. 4441, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  55. Schultz, T.: Speaker Characteristics. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  56. Devillers, L., Vidrascu, L.: Real-life Emotion Recognition in Speech. In: Müller, C. (ed.) Speaker Classification II. LNCS(LNAI), vol. 4441, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  57. Graciarena, M., Shriberg, E., Stolcke, A., Enos, F., Hirschberg, J., Kajarekar, S.: Combining Prosodic, Lexical and Cepstral Systems for Deceptive Speech Detection. In: Proc. ICASSP, vol. 1, pp. 1033–1036 (2006)

    Google Scholar 

  58. Rosenberg, A., Hirschberg, J.: Acoustic/Prosodic Correlates of Charismatic Speech. In: Eurospeech 2005 – Interspeech. Proceedings of the 9th European Conference on Speech Communication and Technology, Lisbon, Portugal (2005)

    Google Scholar 

  59. Solomonoff, A., Quillen, C., Boardman, I.: Channel Compensation for SVM Speaker Recognition. In: Proceedings Odyssey-04 Speaker and Language Recognition Workshop, Toledo, Spain (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Christian Müller

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Shriberg, E. (2007). Higher-Level Features in Speaker Recognition. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74200-5_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74186-2

  • Online ISBN: 978-3-540-74200-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics