Abstract
In this paper we propose a new feature extraction algorithm based on non-linear prediction: the Neural Predictive Coding (NPC) model which is an extension of the classical LPC one. We apply this model to two significant tasks: phoneme classification and speaker identification. For the first one, the NPC model is trained with a Minimum Classification Error (MCE) criterion. The experiments carried out with the NTIMIT database show an improvement of the classification rates. For speaker identification, we propose a new feature extraction principle based on the NPC model. We also investigate different initialization methods. The new method gives better performances than the traditional ones (LPC, MFCC and PLP).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hermansky, H.: Should Recognizers Have Ears? Speech Communication 25, 3–27 (1998)
Mary, L., Rama Murty, K.S., Mahadeva Prasanna, S.R., Yegnanarayana, B.: Features for Speaker and Language Identification. In: Proc. of ISCA Tutorial and Research Workshop on Speaker and Language Recognition (Odyssey 2004), pp. 323–328 (2004)
Gas, B., Zarader, J.L., Chavy, C., Chetouani, M.: Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56, 141–166 (2004)
Kleijn, W.B.: Signal Processing Representations of Speech. IEICE Trans. Inf. and Syst. E86-D(3), 359–376 (2003)
Chetouani, M., Gas, B., Zarader, J.L.: Learning vector quantization and neural predictive coding for nonlinear speech feature extraction. In: EUSIPCO (2004)
Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT: A Phonetically Balanced, Continous Speech, Telephone Bandwidth Speech Database. In: ICASSP, vol. 1, pp. 109–112 (1990)
Chetouani, M., Faundez-Zanuy, M., Gas, B., Zarader, J.L.: A new nonlinear speaker parameterization algorithm for speaker identification. In: Proc. of ISCA Tutorial and Research Workshop on Speaker and Language Recognition (Odyssey 2004), pp. 309–314 (2004)
Burrows, T.L.: Speech Processing with Linear and Neural Networks Models. PhD Cambridge (1996)
Ortega-Garcia, J., et al.: Ahumada: a large speech corpus in Spanish for speaker identification and verification. In: ICASSP, vol. 2, pp. 773–776 (1998)
Bimbot, F., Mathan, L.: Text-free speaker recognition using an arithmeticharmonic sphericity measure. In: EUROSPEECH, pp. 169–172 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chetouani, M., Faundez-Zanuy, M., Gas, B., Zarader, JL. (2005). Non-linear Speech Feature Extraction for Phoneme Classification and Speaker Recognition. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds) Nonlinear Speech Modeling and Applications. NN 2004. Lecture Notes in Computer Science(), vol 3445. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11520153_16
Download citation
DOI: https://doi.org/10.1007/11520153_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27441-4
Online ISBN: 978-3-540-31886-6
eBook Packages: Computer ScienceComputer Science (R0)