Skip to main content

Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4885))

Included in the following conference series:

  • 579 Accesses

Abstract

We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, B., Zhu, Q., Morgan, N.: Learning long-term temporal features in LVCSR using neural networks. In: ICSLP (2004)

    Google Scholar 

  2. Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using MLP features in LVCSR. In: ICSLP (2004)

    Google Scholar 

  3. Heck, L.P., Konig, Y., Sönmez, M.K., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communications 31(2-3), 181–192 (2000)

    Article  Google Scholar 

  4. Konig, Y., Heck, L., Weintraub, M., Sönmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proceedings of RLA2C - Speaker Recognition and Its Commercial and Forensic Applications, Avignon, France (1998)

    Google Scholar 

  5. Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: IEEE Int. Carnahan Conf. on Security Technology (2005)

    Google Scholar 

  6. Wu, D., Morris, A., Koreman, J.: MLP internal representation as discriminative features for improved speaker recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 25–33. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Stolcke, A., Ferrer, L., Kajarekar, S., Shriberg, E., Venkataraman, A.: MLLR transforms as features in speaker recognition. In: EUROSPEECH 2005, pp. 2425–2428 (2005)

    Google Scholar 

  8. Andrews, W., Kohler, M., Campbell, J.: Phonetic speaker recognition. In: Eurospeech, pp. 149–153 (2001)

    Google Scholar 

  9. Kajarekar, S., Ferrer, L., Shriberg, E., Sönmez, K., Stolcke, A., Venkataraman, A., Zheng, J.: SRI’s 2004 NIST speaker recognition evaluation system. In: ICASSP, vol. 1, pp. 173–176 (2005)

    Google Scholar 

  10. Cieri, C., Miller, D., Walker, K.: The Fisher corpus: a resource for the next generations of speech to text. In: LREC, pp. 69–71 (2004)

    Google Scholar 

  11. Linguistic Data Consortium, Switchboard-2 corpora, http://www.ldc.upenn.edu

  12. Johnson, D.: QuickNet3 (2004), http://www.icsi.berkeley.edu/Speech/qn.html

  13. MIT Lincoln Labs, LNKNet (2005), http://www.ll.mit.edu/IST/lnknet

  14. National Institute of Standards and Technology, The NIST year 2004 speaker recognition evaluation plan (2004), http://www.nist.gov/speech/tests/spk/2004/SRE-04_evalplan-v1a.pdf

  15. Stoll, L.: Phonetic- and speaker-discriminant features for speaker recognition, Master’s thesis, University of California at Berkeley (December 2006), http://www.icsi.berkeley.edu/~lstoll/publications/stoll_masters_dec2006.pdf

  16. Auckenthaler, R., Carey, M., Lloyd-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10, 42–54 (2000)

    Article  Google Scholar 

  17. Reynolds, D.: Channel robust speaker verification via feature mapping. In: ICASSP (2003)

    Google Scholar 

  18. Teunen, R., Shahshahani, B., Heck, L.: A model-based transformational approach to robust speaker recognition. In: ICSLP (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mohamed Chetouani Amir Hussain Bruno Gas Maurice Milgram Jean-Luc Zarader

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stoll, L., Frankel, J., Mirghafori, N. (2007). Speaker Recognition Via Nonlinear Phonetic- and Speaker-Discriminative Features. In: Chetouani, M., Hussain, A., Gas, B., Milgram, M., Zarader, JL. (eds) Advances in Nonlinear Speech Processing. NOLISP 2007. Lecture Notes in Computer Science(), vol 4885. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77347-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77347-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77346-7

  • Online ISBN: 978-3-540-77347-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics