Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Bhowmick, Anirban; Biswas, Astik; Chandra, Mahesh

doi:10.1007/s10044-019-00816-0

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Theoretical advances
Published: 03 April 2019

Volume 23, pages 527–539, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

166 Accesses
3 Citations
Explore all metrics

Abstract

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Review of wavelet denoising algorithms

Article 03 April 2023

MFCC in audio signal processing for voice disorder: a review

Article 27 April 2024

References

Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Article Google Scholar
Wong E, Sridharan S (2001) Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In: Proceedings of 2001 international symposium on intelligent multimedia, video and speech processing. IEEE, pp 95–98
Shao Y, Srinivasan S, Jin Z, Wang D (2010) A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput Speech Lang 24(1):77–93
Article Google Scholar
Biswas A, Sahu P, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143
Google Scholar
Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, vol 1. Citeseer, pp 121–124
Gandhiraj R, Sathidevi P (2007) Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications, 2007. ADCOM 2007. IEEE, pp 666–673
Farooq O, Datta S (2001) Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–198
Article Google Scholar
Farooq O, Datta S, Shrotriya M (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavelets Multiresolut Inf Process 8(06):847–859
Article Google Scholar
Wang XP, Zhu C-Q, Li Z-G (2002) A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing
Biswas A, Sahu P, Chandra M (2014) Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput Electr Eng 40(4):1111–1122
Article Google Scholar
Sahu P, Biswas A, Bhowmick A, Chandra M (2014) Auditory erb like admissible wavelet packet features for timit phoneme recognition. Int J Eng Sci Technol 17(3):145–151
Article Google Scholar
Ali AMA, Van der Spiegel J, Mueller P (2002) Robust auditory-based speech processing using the average localized synchrony detection. IEEE Trans Speech Audio Process 10(5):279–292
Article Google Scholar
Kajita S, Itakura F (1994) Subband-autocorrelation analysis and its application for speech recognition. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol 2. IEEE, pp 193–196
Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–141
Biswas A, Sahu P, Bhowmick A, Chandra M (2015) Hindi phoneme classification using wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22
Article Google Scholar
Goh YH, Raveendran P, Jamuar SS (2014) Robust speech recognition using harmonic features. IET Signal Process 8(2):167–175
Article Google Scholar
Fukuda T, Ichikawa O, Nishimura M (2010) Long-term spectro-temporal and static harmonic features for voice activity detection. IEEE J Sel Top Signal Process 4(5):834–844
Article Google Scholar
Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using anova fusion techniques for Hindi phoneme recognition. IET Signal Process 10(8):902–911
Article Google Scholar
Biswas A, Sahu PK, Bhowmick A, Chandra M (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Process 9(6):511–519
Article Google Scholar
Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718
Article Google Scholar
Gonzalez S, Brookes M (2014) PEFAC-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):518–530
Article Google Scholar
Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811
Article Google Scholar
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
Article MathSciNet Google Scholar
Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96. Conference Proceedings, vol 2, IEEE, pp 629–632
El-Fattah MAA, Dessouky MI, Abbas AM, Diab SM, El-Rabaie E-SM, Al-Nuaimy W, Alshebeili SA, El-Samie FEA (2014) Speech enhancement with an adaptive wiener filter. Int J Speech Technol 17(1):53–64
Article Google Scholar
Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728
Article Google Scholar
Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466
Article Google Scholar
Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108
Article Google Scholar

Download references

Author information

Authors and Affiliations

SEEE, Department of ECE, VIT Bhopal University, Bhopal, India
Anirban Bhowmick
Department of Electrical and Electronic Engineering, Stellenbosch University, Stellenbosch, South Africa
Astik Biswas
Department of ECE, Birla Institute of Technology, Mesra, Ranchi, India
Mahesh Chandra

Authors

Anirban Bhowmick
View author publications
You can also search for this author in PubMed Google Scholar
Astik Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anirban Bhowmick.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhowmick, A., Biswas, A. & Chandra, M. Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition. Pattern Anal Applic 23, 527–539 (2020). https://doi.org/10.1007/s10044-019-00816-0

Download citation

Received: 17 September 2017
Accepted: 25 March 2019
Published: 03 April 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10044-019-00816-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Review of wavelet denoising algorithms

MFCC in audio signal processing for voice disorder: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Review of wavelet denoising algorithms

MFCC in audio signal processing for voice disorder: a review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation