A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

Yousefi Azar, Mahmood; Razzazi, Farbod

doi:10.1007/s00521-010-0450-0

A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

Original Article
Published: 05 October 2010

Volume 21, pages 565–574, (2012)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Mahmood Yousefi Azar¹ &
Farbod Razzazi¹

216 Accesses
2 Citations
Explore all metrics

Abstract

Nonlinear feature extraction of speech signals has been the main concern of many researches in recent years. In this paper, feature extraction of phonemes using NPC (neural predictive coding) model is generalized to a combination of time and DCT domains. Two main ideas were proposed and evaluated in this paper. First, a frame-wise DCT-based NPC feature extractor is proposed to overcome the computational complexity deficiency of the system. The basis of this approach is the application of a DCT pre-feature extractor to remove unwanted additional data. In this approach, the extracted features are the output of the hidden layer. It is shown that the use of a pre-processing stage can improve both computational complexity efficiency and accuracy issues. At the second approach, we proposed a complementary role for DCT domain features in classic NPC modeling. This approach uses the signal residual of the predicted signal in the DCT domain. The experiments were conducted on voiced plosive phonemes of TIMIT database. Simulations showed that the performance of the combined method is good at the plosive phonemes. The achieved accuracy that was resulted from the proposed method was 70.3% recognition rate on /b/d/g/ phonemes, which is higher than the results of traditional NPC approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal Feature Selection for Noisy Speech Recognition

New Parametrization of Automatic Speech Recognition System Using Robust PCA

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

Article 16 April 2019

References

Markel JD (1976) Linear prediction of speech. Springer, Berlin
Book MATH Google Scholar
Anusuya MA, Katti SK (2009) Speech recognition by machine: a review. Int J Comput Sci Inf Secur 6:181–205
Google Scholar
Shi G, Shanechi M, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Processing 14:1867–1874
Article Google Scholar
Garimella S, Nemala SK, Elhilali M, Tran TD, Hermansky H (2010) Sparse coding for speech recognition. The IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP’10, March Dallas, Texas, USA
Zhao SY, Morgan N (2008) Multi-stream spectro-temporal features for robust speech recognition. Proc. Interspeech 898–901
Mesgarani N, Sivaram GSVS, Nemala SK, Elhilali M, Hermansky H (2009) Discriminant spectrotemporal features for phoneme recognition. 10th annual conference of the international speech communication association (INTERSPEECH), Brighton
Zamalloa M, Bordel G, Rodriguez LJ, Penagarikano M. (2008) Feature selection based on genetic algorithms for speaker recognition, pp 1153–1154
Beritelli F, Casaie S, Russo A, Serrano S (2005) A genetic algorithm feature selection approach to robust classification between “positive” and “negative” emotional states in speakers. Signals Syst Comput 550–553
Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Michal-ski Rs, carbonell JG, Mitchell TM (eds) Machine learning—an artificial intelligence approach, vol 2, pp 593–624
Chetouani M, Gas B, Zarader J-L, Chavy C (2002) Neural predictive coding for speech discriminant feature extraction. ESANN, pp 275–280
Gas B, Zarader J-L, Chavy C, Chetouani M (2004) Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 141–166
Tishby N (1990) A dynamical system approach to speech processing. In: Proceedings of international conference on signal and speech processing, vol 1. Albuquerque, NM, USA, pp 365–368
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K (1989) Phoneme recognition using time-delay neural networks. IEEE Trans ASSP 37:328–339
Article Google Scholar
Atal Bs, Schroeder MR (1968) Predictive coding of speech signals. Report of the 6th international congress on acoustics. Tokyo, Japan
Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Internal Report, Los Alamos National Laboratory
Gas B, Zarader JL, Chavy C, Chetouani M (2001) Discriminant features extraction by predictive neural networks. In: WSES international conference in signal speech and image processing (SSIP01). Advances in signal processing and communications. Malta, pp 64–68
Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2004) A new nonlinear feature extraction algorithm for speaker verification. In: International conference on spoken language processing (ICSLP 04). Jeju Island, Korea
Andrés Berzala J, Zufiria PJ (2007) Dynamic behavior of DCT and DDT formulations for the Sanger neural network. Neurocomputing 70:2768–2774
Article Google Scholar
Yousefi Azar M, Razzazi F (2008) A DCT based nonlinear predictive coding for feature extraction in speech recognition systems. Istanbul-Turkey, CIMSA 2008. IEEE international conference on computational intelligence for measurement systems and applications, pp 19–22
Sunitha SL, Udayashankara V (2006) Fast recursive DCT-LMS speech enhancement For performance enhancement of digital hearing aid. Academic Open Internet J 18
Gas B, Zarader JL, Chavy C (2001) A new approach to speech coding: the neural predictive coding. J Adv Comput Intell 4:120–127
Google Scholar
Zhu X, Wyse L (2004) Sound texture modelling and time-frequency LPC. In: Proceedings of the 7th international conference on digital audio effects DAFX’04, Naples
Athineos M, Ellis D (2003) Sound texture modeling with linear prediction in both time and frequency domains. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing ICASSP’03, vol 5, pp 648–51
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) (1990) Speech disc. 1-1.1/NTIS.PB91-505065
Huang X, Acero A, Hon H (2001) Spoken language processing. A guide to theory, algorithm, and system development. Prentice Hall, Englewood Cliffs
Google Scholar
Jain A (1989) Fundamentals of digital image processing. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Samir JS, Ahmad AM (2009) Neural networks based time-delay estimation using DCT coefficients. Am J Appl Sci 703–708

Download references

Author information

Authors and Affiliations

Science and Research Branch, Islamic Azad University, Tehran, Iran
Mahmood Yousefi Azar & Farbod Razzazi

Authors

Mahmood Yousefi Azar
View author publications
You can also search for this author in PubMed Google Scholar
Farbod Razzazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmood Yousefi Azar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yousefi Azar, M., Razzazi, F. A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition. Neural Comput & Applic 21, 565–574 (2012). https://doi.org/10.1007/s00521-010-0450-0

Download citation

Received: 08 April 2010
Accepted: 15 September 2010
Published: 05 October 2010
Issue Date: April 2012
DOI: https://doi.org/10.1007/s00521-010-0450-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

Abstract

Access this article

Similar content being viewed by others

Temporal Feature Selection for Noisy Speech Recognition

New Parametrization of Automatic Speech Recognition System Using Robust PCA

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

Abstract

Access this article

Similar content being viewed by others

Temporal Feature Selection for Noisy Speech Recognition

New Parametrization of Automatic Speech Recognition System Using Robust PCA

Wavelet-Based Power Normalized Spectrum for Hindi Phoneme Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation