MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Wu, Dalei; Morris, Andrew; Koreman, Jacques

doi:10.1007/11613107_5

Dalei Wu²³,
Andrew Morris²³ &
Jacques Koreman²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Included in the following conference series:

International Conference on Nonlinear Analyses and Algorithms for Speech Processing

744 Accesses
4 Citations

Abstract

Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Feature Extraction Analysis in a Speaker Identification System

Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques

Automatic Speech Recognition in English Language: A Review

References

Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)
Google Scholar
Bengio, S., Bimbot, F., Mariethoz, j., Popovici, V., Poree, F., Bailly-Bailliere, E., Matas, G., Ruiz, B.: Experimental protocol on the BANCA database, IDIAP-RR 02-05 (2002)
Google Scholar
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library, Technical Report IDIAP-RR 02-46 (2002)
Google Scholar
Duda, O., Hart, P.E., Stork, D.G.: Pattern classification. Wiley, Chichester (2001)
MATH Google Scholar
Fisher, W.M., Doddingtion, G.R., Goudie-Marshall, K.M.: The DARPA speech recognition research database: Specifications and status. In: Proc. DARPA Workshop on Speech Recognition, pp. 93–99 (1986)
Google Scholar
Fontaine, V., Ris, C., Boite, J.-M.: Nonlinear Discriminant Analysis for improved speech recognition. In: Proc. Eurospeech 1997, pp. 2071–2074 (1997)
Google Scholar
Genoud, D., Ellis, D., Morgan, N.: Combined speech and speaker recognition with speaker-adapted connectionist models. In: Proc. ASRU (1999)
Google Scholar
Heck, L., Konig, Y., Sönmez, K., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31, 181–192 (2000)
Article Google Scholar
Jin, Q., Waibel, A.: Application of LDA to speaker recognition. In: Proc. ICSLP 2000 (2000)
Google Scholar
Konig, Y., Heck, L., Weintraub, M., Sönmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proc. RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, pp. 72–75 (1998)
Google Scholar
Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: IEEE Int. Carnahan Conf. on Security Technology (ICCST 2005), Las Palmas (2005, accepted)
Google Scholar
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)
Article Google Scholar
Reynolds, D.A., Doddington, D.R., Przybocki, M.A., Martin, F.: The NIST speaker recognition evaluation – overview, methodology, systems, results, perspective. Speech Communication 31(2-3), 225–254 (2000)
Article Google Scholar
Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The effect of telephone transmission degradations on speaker recognition performance. In: Proc. ICASSP 1995, pp. 329–332 (1995)
Google Scholar
Sharma, S., Ellis, D., Kajarekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: ICASSP 2000 (2000)
Google Scholar
Young, S., et al.: HTKbook (V3.2), Cambridge University Engineering Dept. (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Phonetics, Saarland University, P.O. Box 15 11 50, D-66041, Saarbrücken, Germany
Dalei Wu, Andrew Morris & Jacques Koreman

Authors

Dalei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Morris
View author publications
You can also search for this author in PubMed Google Scholar
Jacques Koreman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escola Universitària Politècnica de Mataró, UPC, Spain
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Spain
Léonard Janer & Antonio Satue-Villar &
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
Josep Roure
Escola Universitària Politècnica de Mataró (UPC), Barcelona, Spain
Virginia Espinosa-Duro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, D., Morris, A., Koreman, J. (2006). MLP Internal Representation as Discriminative Features for Improved Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_5

Download citation

DOI: https://doi.org/10.1007/11613107_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Extraction Analysis in a Speaker Identification System

Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques

Automatic Speech Recognition in English Language: A Review

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MLP Internal Representation as Discriminative Features for Improved Speaker Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Extraction Analysis in a Speaker Identification System

Punjabi Children Speech Recognition System Under Mismatch Conditions Using Discriminative Techniques

Automatic Speech Recognition in English Language: A Review

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation