Abstract
Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)
Bengio, S., Bimbot, F., Mariethoz, j., Popovici, V., Poree, F., Bailly-Bailliere, E., Matas, G., Ruiz, B.: Experimental protocol on the BANCA database, IDIAP-RR 02-05 (2002)
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library, Technical Report IDIAP-RR 02-46 (2002)
Duda, O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley, Chichester (2001)
Fisher, W.M., Doddingtion, G.R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: Specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (1986)
Fontaine, V., Ris, C., Boite, J.-M.: Nonlinear Discriminant Analysis for improved speech recognition. In: Proc. Eurospeech 1997, pp. 2071–2074 (1997)
Genoud, D., Ellis, D., Morgan, N.: Combined speech and speaker recognition with speaker-adapted connectionist models. In: Proc. ASRU (1999)
Heck, L., Konig, Y., Sönmez, K., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31, 181–192 (2000)
Jin, Q., Waibel, A.: Application of LDA to speaker recognition. In: Proc. ICSLP 2000 (2000)
Konig, Y., Heck, L., Weintraub, M., Sönmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proc. RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, pp. 72–75 (1998)
Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: IEEE Int. Carnahan Conf. on Security Technology (ICCST 2005), Las Palmas (2005, accepted)
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)
Reynolds, D.A., Doddington, D.R., Przybocki, M.A., Martin, F.: The NIST speaker recognition evaluation – overview, methodology, systems, results, perspective. Speech Communication 31(2-3), 225–254 (2000)
Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The effect of telephone transmission degradations on speaker recognition performance. In: Proc. ICASSP 1995, pp. 329–332 (1995)
Sharma, S., Ellis, D., Kajarekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: ICASSP 2000 (2000)
Young, S., et al.: HTKbook (V3.2), Cambridge University Engineering Dept. (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, D., Morris, A., Koreman, J. (2006). MLP Internal Representation as Discriminative Features for Improved Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_5
Download citation
DOI: https://doi.org/10.1007/11613107_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)