Skip to main content

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

  • Conference paper
Nonlinear Analyses and Algorithms for Speech Processing (NOLISP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Abstract

Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)

    Google Scholar 

  2. Bengio, S., Bimbot, F., Mariethoz, j., Popovici, V., Poree, F., Bailly-Bailliere, E., Matas, G., Ruiz, B.: Experimental protocol on the BANCA database, IDIAP-RR 02-05 (2002)

    Google Scholar 

  3. Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library, Technical Report IDIAP-RR 02-46 (2002)

    Google Scholar 

  4. Duda, O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley, Chichester (2001)

    MATH  Google Scholar 

  5. Fisher, W.M., Doddingtion, G.R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: Specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (1986)

    Google Scholar 

  6. Fontaine, V., Ris, C., Boite, J.-M.: Nonlinear Discriminant Analysis for improved speech recognition. In: Proc. Eurospeech 1997, pp. 2071–2074 (1997)

    Google Scholar 

  7. Genoud, D., Ellis, D., Morgan, N.: Combined speech and speaker recognition with speaker-adapted connectionist models. In: Proc. ASRU (1999)

    Google Scholar 

  8. Heck, L., Konig, Y., Sönmez, K., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31, 181–192 (2000)

    Article  Google Scholar 

  9. Jin, Q., Waibel, A.: Application of LDA to speaker recognition. In: Proc. ICSLP 2000 (2000)

    Google Scholar 

  10. Konig, Y., Heck, L., Weintraub, M., Sönmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proc. RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, pp. 72–75 (1998)

    Google Scholar 

  11. Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: IEEE Int. Carnahan Conf. on Security Technology (ICCST 2005), Las Palmas (2005, accepted)

    Google Scholar 

  12. Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)

    Article  Google Scholar 

  13. Reynolds, D.A., Doddington, D.R., Przybocki, M.A., Martin, F.: The NIST speaker recognition evaluation – overview, methodology, systems, results, perspective. Speech Communication 31(2-3), 225–254 (2000)

    Article  Google Scholar 

  14. Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The effect of telephone transmission degradations on speaker recognition performance. In: Proc. ICASSP 1995, pp. 329–332 (1995)

    Google Scholar 

  15. Sharma, S., Ellis, D., Kajarekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: ICASSP 2000 (2000)

    Google Scholar 

  16. Young, S., et al.: HTKbook (V3.2), Cambridge University Engineering Dept. (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, D., Morris, A., Koreman, J. (2006). MLP Internal Representation as Discriminative Features for Improved Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_5

Download citation

  • DOI: https://doi.org/10.1007/11613107_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31257-4

  • Online ISBN: 978-3-540-32586-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics