New Sub-band Processing Framework Using Non-linear Predictive Models for Speech Feature Extraction

Chetouani, Mohamed; Hussain, Amir; Gas, Bruno; Zarader, Jean-Luc

doi:10.1007/11613107_25

Mohamed Chetouani²³,
Amir Hussain²⁴,
Bruno Gas²³ &
…
Jean-Luc Zarader²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3817))

Included in the following conference series:

International Conference on Nonlinear Analyses and Algorithms for Speech Processing

721 Accesses

Abstract

Speech feature extraction methods are commonly based on time and frequency processing approaches. In this paper, we propose a new framework based on sub-band processing and non-linear prediction. The key idea is to pre-process the speech signal by a filter bank. From the resulting signals, non-linear predictors are computed. The feature extraction method involves the association of different Neural Predictive Coding (NPC) models. We apply this new framework to phoneme classification and experiments carried out with the NTIMIT database show an improvement of the classification rates in comparison with the full-band approach. The new method is also shown to give better performance than the traditional Linear Predictive Coding (LPC), Mel Frequency Cepstral Coding (MFCC) and Perceptual Linear Prediction (PLP) methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Robust Feature Extraction Based on Teager-Entropy and Half Power Spectrum Estimation for Speech Recognition

Spectro-temporal Power Spectrum Features for Noise Robust ASR

Article 22 November 2016

Spectral Analysis for Automatic Speech Recognition and Enhancement

References

Allen, J.B.: How Do Humans Process and Recognize Speech? IEEE Trans. on Speech and Audio Processing 2(4), 567–577 (1994)
Article Google Scholar
Besacier, L., Bonastre, J.F.: Subband approach for automatic speaker recognition: Optimal division of the frequency. In: Audio and Video-based Biometric Person Authentification. LNCS, pp. 195–202. Springer, Heidelberg (1997)
Google Scholar
Chetouani, M.: Codage neuro-prédictif pour l’extraction de caractéristiques de signaux de signaux de parole. Université Paris VI (2004)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley-Interscience Publication, Hoboken (2001)
MATH Google Scholar
Gas, B., Zarader, J.L., Chavy, C., Chetouani, M.: Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 56, 141–166 (2004)
Article Google Scholar
Ghitza, O.: Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition. IEEE Trans. on Speech and Audio Processing 2(1), 115–132 (1994)
Article Google Scholar
Gold, B., Nelson, N.: Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley and Sons, INC, Chichester (2000)
Google Scholar
Greenberg, S.: Representation of speech in the auditory periphery. Journal of Phonetics, Special Issue 16(1) (January 1994)
Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 1738–1752 (1990)
Google Scholar
Hermansky, H.: Auditory Modeling in Automatic Recognition of Speech. In: Proc. Keele Workshop (1996)
Google Scholar
Hermansky, H., Tibrewala, S., Pavel, M.: Towards ASR on Partially Corrupted Speech. In: Proc. ICSLP (1996)
Google Scholar
Hussain, A., Campbell, D.R.: Binaural Sub-Band Adaptive Speech Enhancement Using Artificial Neural Networks. Speech Communication, 177–186 (1998)
Google Scholar
Jankowski, C., Kalyanswamy, A., Basson, S., Spitz, J.: NTIMIT: A Phonetically Balanced, Continous Speech, Telephone Bandwidth Speech Database. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 1, pp. 109–112 (1990)
Google Scholar
Kleijn, W.B.: Signal Processing Representations of Speech. IEICE Trans. Inf. and Syst. E86-D 3, 359–376 (2003)
Google Scholar
Paliwal, K.K.: Spectral Subband Centroid Features for Speech Recognition. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 617–620 (1988)
Google Scholar
Tibrewala, S., Hermansky, H.: Sub-band Based Recognition of Noisy Speech. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), vol. 2, pp. 1255–1258 (1997)
Google Scholar
Yu, R., Ko, C.C.: A Warped Linear-Prediction-Based Subband Audio Coding Algorithm. IEEE Trans. on Speech and Audio Processing 10(2), 1–8 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire des Instruments et Systèmes d’Ile-De-France, Université Paris VI, Paris, France
Mohamed Chetouani, Bruno Gas & Jean-Luc Zarader
Dept. of Computing Science and Mathematics, University of Stirling, Scotland, UK
Amir Hussain

Authors

Mohamed Chetouani
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Gas
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Zarader
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escola Universitària Politècnica de Mataró, UPC, Spain
Marcos Faundez-Zanuy
Escola Universitària Politècnica de Mataró, Spain
Léonard Janer & Antonio Satue-Villar &
Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare, (SA), Italy
Anna Esposito
The Auton Lab, Carnegie Mellon University, Pittsburgh, PA, USA
Josep Roure
Escola Universitària Politècnica de Mataró (UPC), Barcelona, Spain
Virginia Espinosa-Duro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chetouani, M., Hussain, A., Gas, B., Zarader, JL. (2006). New Sub-band Processing Framework Using Non-linear Predictive Models for Speech Feature Extraction. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_25

Download citation

DOI: https://doi.org/10.1007/11613107_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics