Skip to main content
Log in

Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Wavelet-based front-end processing technique has gained popularity for its noise removing capability. In this paper, a robust automatic speech recognition system is proposed by utilizing the advantages of psycho-acoustically motivated wavelet-based front-end compensator. In the front-end compensator block, voiced speech probability-based voice activity detector system is designed to separate voiced and unvoiced frames and to update noise statistics. The wavelet packet decomposition tree is designed according to equal rectangular bandwidth (ERB) scale. Wavelet decomposition based on ERB scale is utilized here as the central frequency of the ERB distribution resembles frequency response of human cochlea. Voiced and unvoiced frames are separately decomposed into 24 sub-bands to estimate average sub-band energy (ASE) of each frame. ASE is then used to calculate threshold value. Lastly, Wiener filtering is employed for reducing the residual noise before final reconstruction stage. The proposed system is evaluated on TIMIT database under various noise conditions. The phoneme recognition accuracy of the proposed system is compared with different baseline and robust features as well as with existing front-end compensation techniques. Additionally, the proposed front-end compensator is evaluated in terms of phoneme classification accuracy. Performance improvement is observed in all above experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366

    Article  Google Scholar 

  2. Wong E, Sridharan S (2001) Comparison of linear prediction cepstrum coefficients and mel-frequency cepstrum coefficients for language identification. In: Proceedings of 2001 international symposium on intelligent multimedia, video and speech processing. IEEE, pp 95–98

  3. Shao Y, Srinivasan S, Jin Z, Wang D (2010) A computational auditory scene analysis system for speech segregation and robust speech recognition. Comput Speech Lang 24(1):77–93

    Article  Google Scholar 

  4. Biswas A, Sahu P, Bhowmick A, Chandra M (2014) Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition. WSEAS Trans Syst 13:130–143

    Google Scholar 

  5. Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, vol 1. Citeseer, pp 121–124

  6. Gandhiraj R, Sathidevi P (2007) Auditory-based wavelet packet filterbank for speech recognition using neural network. In: International conference on advanced computing and communications, 2007. ADCOM 2007. IEEE, pp 666–673

  7. Farooq O, Datta S (2001) Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process Lett 8(7):196–198

    Article  Google Scholar 

  8. Farooq O, Datta S, Shrotriya M (2010) Wavelet sub-band based temporal features for robust Hindi phoneme recognition. Int J Wavelets Multiresolut Inf Process 8(06):847–859

    Article  Google Scholar 

  9. Wang XP, Zhu C-Q, Li Z-G (2002) A comparative study on wavelet packet based front-end in connected mandarin digit recognition. In: International symposium on Chinese spoken language processing

  10. Biswas A, Sahu P, Chandra M (2014) Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Comput Electr Eng 40(4):1111–1122

    Article  Google Scholar 

  11. Sahu P, Biswas A, Bhowmick A, Chandra M (2014) Auditory erb like admissible wavelet packet features for timit phoneme recognition. Int J Eng Sci Technol 17(3):145–151

    Article  Google Scholar 

  12. Ali AMA, Van der Spiegel J, Mueller P (2002) Robust auditory-based speech processing using the average localized synchrony detection. IEEE Trans Speech Audio Process 10(5):279–292

    Article  Google Scholar 

  13. Kajita S, Itakura F (1994) Subband-autocorrelation analysis and its application for speech recognition. In: 1994 IEEE international conference on acoustics, speech, and signal processing, 1994. ICASSP-94, vol 2. IEEE, pp 193–196

  14. Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. In: IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings.(ICASSP’04), vol 1. IEEE, pp I–141

  15. Biswas A, Sahu P, Bhowmick A, Chandra M (2015) Hindi phoneme classification using wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Comput Electr Eng 42:12–22

    Article  Google Scholar 

  16. Goh YH, Raveendran P, Jamuar SS (2014) Robust speech recognition using harmonic features. IET Signal Process 8(2):167–175

    Article  Google Scholar 

  17. Fukuda T, Ichikawa O, Nishimura M (2010) Long-term spectro-temporal and static harmonic features for voice activity detection. IEEE J Sel Top Signal Process 4(5):834–844

    Article  Google Scholar 

  18. Biswas A, Sahu PK, Chandra M (2016) Admissible wavelet packet sub-band based harmonic energy features using anova fusion techniques for Hindi phoneme recognition. IET Signal Process 10(8):902–911

    Article  Google Scholar 

  19. Biswas A, Sahu PK, Bhowmick A, Chandra M (2015) Admissible wavelet packet sub-band-based harmonic energy features for Hindi phoneme recognition. IET Signal Process 9(6):511–519

    Article  Google Scholar 

  20. Bhowmick A, Chandra M (2017) Speech enhancement using voiced speech probability based wavelet decomposition. Comput Electr Eng 62:706–718

    Article  Google Scholar 

  21. Gonzalez S, Brookes M (2014) PEFAC-a pitch estimation algorithm robust to high levels of noise. IEEE/ACM Trans Audio Speech Lang Process 22(2):518–530

    Article  Google Scholar 

  22. Islam MT, Shahnaz C, Zhu W-P, Ahmad MO (2015) Speech enhancement based on student modeling of Teager energy operated perceptual wavelet packet coefficients and a custom thresholding function. IEEE/ACM Trans Audio Speech Lang Process 23(11):1800–1811

    Article  Google Scholar 

  23. Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627

    Article  MathSciNet  Google Scholar 

  24. Scalart P, Filho JV (1996) Speech enhancement based on a priori signal to noise estimation. In: 1996 IEEE international conference on acoustics, speech, and signal processing, 1996. ICASSP-96. Conference Proceedings, vol 2, IEEE, pp 629–632

  25. El-Fattah MAA, Dessouky MI, Abbas AM, Diab SM, El-Rabaie E-SM, Al-Nuaimy W, Alshebeili SA, El-Samie FEA (2014) Speech enhancement with an adaptive wiener filter. Int J Speech Technol 17(1):53–64

    Article  Google Scholar 

  26. Cohen I (2004) Speech enhancement using a noncausal a priori SNR estimator. IEEE Signal Process Lett 11(9):725–728

    Article  Google Scholar 

  27. Lu Y, Loizou PC (2008) A geometric approach to spectral subtraction. Speech Commun 50(6):453–466

    Article  Google Scholar 

  28. Plapous C, Marro C, Scalart P (2006) Improved signal-to-noise ratio estimation for speech enhancement. IEEE Trans Audio Speech Lang Process 14(6):2098–2108

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anirban Bhowmick.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhowmick, A., Biswas, A. & Chandra, M. Performance evaluation of psycho-acoustically motivated front-end compensator for TIMIT phone recognition. Pattern Anal Applic 23, 527–539 (2020). https://doi.org/10.1007/s10044-019-00816-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-019-00816-0

Keywords

Navigation