Abstract
Speech feature extraction methods are commonly based on time and frequency processing approaches. In this paper, we propose a new framework based on sub-band processing and non-linear prediction. The key idea is to pre-process the speech signal by a filter bank. From the resulting signals, non-linear predictors are computed. The feature extraction method involves the association of different Neural Predictive Coding (NPC) models. We apply this new framework to phoneme classification and experiments carried out with the NTIMIT database show an improvement of the classification rates in comparison with the full-band approach. The new method is also shown to give better performance than the traditional Linear Predictive Coding (LPC), Mel Frequency Cepstral Coding (MFCC) and Perceptual Linear Prediction (PLP) methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allen, J.B.: How Do Humans Process and Recognize Speech? IEEE Trans. on Speech and Audio Processing 2(4), 567–577 (1994)
Besacier, L., Bonastre, J.F.: Subband approach for automatic speaker recognition: Optimal division of the frequency. In: Audio and Video-based Biometric Person Authentification. LNCS, pp. 195–202. Springer, Heidelberg (1997)
Chetouani, M.: Codage neuro-prédictif pour l’extraction de caractéristiques de signaux de signaux de parole. Université Paris VI (2004)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2001)
Gas, B., Zarader, J.L., Chavy, C., Chetouani, M.: Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56, 141–166 (2004)
Ghitza, O.: Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition. IEEE Trans. on Speech and Audio Processing 2(1), 115–132 (1994)
Gold, B., Nelson, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley and Sons, INC, Chichester (2000)
Greenberg, S.: Representation of speech in the auditory periphery. Journal of Phonetics, Special Issue 16(1) (January 1994)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 1738–1752 (1990)
Hermansky, H.: Auditory Modeling in Automatic Recognition of Speech. In: Proc. Keele Workshop (1996)
Hermansky, H., Tibrewala, S., Pavel, M.: Towards ASR on Partially Corrupted Speech. In: Proc. ICSLP (1996)
Hussain, A., Campbell, D.R.: Binaural Sub-Band Adaptive Speech Enhancement Using Artificial Neural Networks. Speech Communication, 177–186 (1998)
Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT: A Phonetically Balanced, Continous Speech, Telephone Bandwidth Speech Database. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 109–112 (1990)
Kleijn, W.B.: Signal Processing Representations of Speech. IEICE Trans. Inf. and Syst. E86-D 3, 359–376 (2003)
Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 617–620 (1988)
Tibrewala, S., Hermansky, H.: Sub-band Based Recognition of Noisy Speech. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 1255–1258 (1997)
Yu, R., Ko, C.C.: A Warped Linear-Prediction-Based Subband Audio Coding Algorithm. IEEE Trans. on Speech and Audio Processing 10(2), 1–8 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chetouani, M., Hussain, A., Gas, B., Zarader, JL. (2006). New Sub-band Processing Framework Using Non-linear Predictive Models for Speech Feature Extraction. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_25
Download citation
DOI: https://doi.org/10.1007/11613107_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)