Abstract
This chapter describes a method for enhancing the differences between speaker classes at the feature level (feature enhancement) in an automatic speaker recognition system. The original Mel-frequency cepstral coefficient (MFCC) space is projected onto a new feature space by a neural network trained on a subset of speakers which is representative for the whole target population. The new feature space better discriminates between the target classes (speakers) than the original feature space. The chapter focuses on the method for selecting a representative subset of speakers, comparing several approaches to speaker selection. The effect of feature enhancement is tested both for clean and various noisy speech types to evaluate its applicability under practical conditions. It is shown that the proposed method leads to a substantial improvement in speaker recognition performance. The method can also be applied to other automatic speaker classification tasks.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Laver, J.: The Phonetic Description of Voice Quality. Cambridge University Press, Cambridge (1980)
Dellwo, V., Huckvale, M., Ashby, M.: How is individuality expressed in voice? An introduction to speech production & description for speaker classification. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Konig, Y., Heck, L., Weintraub, M., Sonmez, K.: Nonlinear discriminant feature extraction for robust text-independent speaker recognition. In: Proceedings of RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, Avignon, France, pp. 72–75 (1998)
Heck, L., Konig, Y., Kemal Sönmez, M., Weintraub, M.: Robustness to telephone handset distortion in speaker recognition by discriminative feature design. Speech Communication 31, 181–192 (2000)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L., Zue, V.: TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium (1993)
Varga, A., Steeneken, H.J.M.: Assesment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12, 247–252 (1993)
ITU recommendation P.56: Objective measurement of active speech level (March 1993)
Fisher, W.M., Doddington, G.R., Goudie-Mashall, K.M., Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT. Linguistic Data Consortium (1993)
Sturim, D.E., Campbell, W.M., Reynolds, D.A.: Classification methods for speaker recognition. In: Müller, C. (ed.) Speaker Classification I. LNCS(LNAI), vol. 4343, Springer, Heidelberg (this issue, 2007)
Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17, 91–108 (1995)
Reynolds, D.A., Zissman, M.A., Quatieri, T.F., O’Leary, G.C., Carlson, B.A.: The effect of telephone transmission degradations on speaker recognition performance. In: Proceedings ICASSP 1995, Detroit, Michigan, pp. 329–332 (1995)
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical Report IDIAP-RR 02-46 (2002)
Sharma, S., Ellis, D., Kajarekar, S., Jain, P., Hermansky, H.: Feature extraction using non-linear transformation for robust speech recognition on the Aurora database. In: Proceedings ICASSP2000, Istanbul, Turkey, pp. 1117–1120 (2000)
Wu, D., Morris, A.C., Koreman, J.: MLP internal representation as discriminative features for improved speaker recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 72–80. Springer, Heidelberg (2006)
Morris, A.C., Wu, D., Koreman, J.: MLP trained to separate problem speakers provides improved features for speaker identification. In: ICCST 2005. Proceedings IEEE Int. Carnahan Conf. on Security Technology, Las Palmas, Spain (2005)
Wu, D.: Discriminative Preprocessing of Speech: Towards Improving Biometric Identification. Ph.D. thesis Saarland University, Saarbrücken, Germany (2007)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Elsevier Academic Press, Amsterdam (2003)
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2, 639–643 (1994)
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings ICASSP, Glasgow, pp. 532–535 (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Koreman, J., Wu, D., Morris, A.C. (2007). Enhancing Speaker Discrimination at the Feature Level. In: Müller, C. (eds) Speaker Classification I. Lecture Notes in Computer Science(), vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74200-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74186-2
Online ISBN: 978-3-540-74200-5
eBook Packages: Computer ScienceComputer Science (R0)