Mel scaled M-band wavelet filter bank for speech recognition

Upadhyaya, Prashant; Farooq, Omar; Abidi, M. R.

doi:10.1007/s10772-018-9545-2

Mel scaled M-band wavelet filter bank for speech recognition

Published: 07 August 2018

Volume 21, pages 797–807, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

349 Accesses
3 Citations
Explore all metrics

Abstract

A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Perceptual Wavelet Packet Features for Recognition of Continuous Kannada Speech

Article 21 July 2021

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

Article 04 June 2014

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

References

Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.
Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. https://doi.org/10.1016/S0165-0270(02)00340-0.
Article Google Scholar
Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466. https://doi.org/10.1007/s11235-011-9623-0.
Article Google Scholar
Bhati, D., Sharma, M., Pachori, R. B., & Gadre, V. M. (2017). Time–frequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing, 62, 259–273. https://doi.org/10.1016/J.DSP.2016.12.004.
Article Google Scholar
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014a). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399. https://doi.org/10.1007/s10772-014-9236-6.
Article Google Scholar
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2015). Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Computers & Electrical Engineering, 42, 12–22. https://doi.org/10.1016/J.COMPELECENG.2014.12.017.
Article Google Scholar
Biswas, A., Sahu, P. K., & Chandra, M. (2014b). Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122. https://doi.org/10.1016/J.COMPELECENG.2014.01.008.
Article Google Scholar
Biswas, A., Sahu, P. K., & Chandra, M. (2016). Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Processing, 10(8), 902–911. https://doi.org/10.1049/iet-spr.2015.0488.
Article Google Scholar
Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319. https://doi.org/10.1007/s13042-017-0645-0.
Article Google Scholar
Chiu, C.-C., Chuang, C.-M., & Hsu, C.-Y. (2009). Discrete wavelet transform applied on personal identity verification with ECG signal. International Journal of Wavelets, Multiresolution and Information Processing, 07(03), 341–355. https://doi.org/10.1142/S0219691309002957.
Article MATH Google Scholar
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420.
Article Google Scholar
Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198. https://doi.org/10.1109/97.928676.
Article Google Scholar
Farooq, O., & Datta, S. (2003). Wavelet-based denoising for robust feature extraction for speech recognition. Electronics Letters, 39(1), 163–165. https://doi.org/10.1049/el:20030068.
Article Google Scholar
Farooq, O., & Datta, S. (2005). Wavelet based robust sub-band features for phoneme recognition. Chinese Journal of Electronics, 14(1), 115–118. https://doi.org/10.1049/ip-vis.
Google Scholar
Farooq, O., Datta, S., & Shrotriya, M. C. (2010). Wavelet sub-band based temporal features for robust Hindi phoneme recognition. International Journal of Wavelets, Multiresolution and Information Processing, 08(06), 847–859. https://doi.org/10.1142/S0219691310003845.
Article Google Scholar
Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).
Grigoryan, A. M. (2005). Fourier transform representation by frequency-time wavelets. IEEE Transactions on Signal Processing, 53(7), 2489–2497. https://doi.org/10.1109/TSP.2005.849180.
Article MathSciNet MATH Google Scholar
Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).
Kim, C., & Stern, R. M. (2012). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4101–4104). IEEE. https://doi.org/10.1109/ICASSP.2012.6288820.
Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32. https://doi.org/10.1504/IJCSYSE.2012.044740.
Article Google Scholar
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777. https://doi.org/10.1109/TASLP.2014.2304637.
Article Google Scholar
Lin, T., Hao, P., & Xu, S. (2006a). Matrix factorizations for reversible integer implementation of orthonormal M-band wavelet transforms. Signal Processing, 86(8), 2085–2093. https://doi.org/10.1016/J.SIGPRO.2005.10.015.
Article MATH Google Scholar
Lin, T., Xu, S., Shi, Q., & Hao, P. (2006b). An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation, 172(2), 717–730. https://doi.org/10.1016/j.amc.2004.11.025.
Article MathSciNet MATH Google Scholar
Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.
Long, C., & Datta, S. (1996a). Wavelet based feature extraction for phoneme recognition. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
Long, C. J. J., & Datta, S. (1996b). Wavelet based feature extraction for phoneme recognition. In ICSLP 96: Fourth international conference on spoken language (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.
Mishra, A. N., Chandra, M., Biswas, A., & Sharan, S. N. (2013). Hindi phoneme-viseme recognition from continuous speech. International Journal of Signal and Imaging Systems Engineering, 6(3), 164. https://doi.org/10.1504/IJSISE.2013.054793.
Article Google Scholar
Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6.
Google Scholar
Munoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O(N) complexity. Signal Processing, 82(5), 749–757. https://doi.org/10.1016/S0165-1684(02)00140-8.
Article MATH Google Scholar
Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. https://doi.org/10.1016/J.ESWA.2007.12.065.
Article Google Scholar
Pollock, S., & Cascio, IL (2007). Non-dyadic wavelet analysis. In Optimisation, econometric and financial analysis (pp. 167–203). Berlin: Springer. https://doi.org/10.1007/3-540-36626-1_9.
Chapter Google Scholar
Rajoub, B., Alshamali, A., & Al-Fahoum, A. S. (2002). An efficient coding algorithm for the compression of ECG signals using the wavelet transform. IEEE Transactions on Biomedical Engineering, 49(4), 355–362. https://doi.org/10.1109/10.991163.
Article Google Scholar
Rioul, O., & Duhamel, P. (1992). Fast algorithms for discrete and continuous wavelet transforms. IEEE Transactions on Information Theory, 38(2), 569–586. https://doi.org/10.1109/18.119724.
Article MathSciNet MATH Google Scholar
Rioul, O., & Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, 8(4), 14–38. https://doi.org/10.1109/79.91217.
Article Google Scholar
Sanderson, C., & Lovell, B. C. (2009). Multi-region probabilistic histograms for robust and scalable identity inference. In Lecture notes in computer science (Vol. 5558, pp. 199–208). Berlin: Springer. https://doi.org/10.1007/978-3-642-01793-3_21.
Google Scholar
Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.
Article MathSciNet MATH Google Scholar
Steffen, P., Heller, P. N., Gopinath, R. A., & Burrus, C. S. (1993). Theory of regular M-band wavelet bases. IEEE Transactions on Signal Processing, 41(12), 3497–3511. https://doi.org/10.1109/78.258088.
Article MATH Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197. https://doi.org/10.1016/J.SIGPRO.2014.06.027.
Article Google Scholar
Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).
Tian, J., & Wells, R. O. (2000). An algebraic structure of orthogonal wavelet space. Applied and Computational Harmonic Analysis, 8(3), 223–248. https://doi.org/10.1006/acha.2000.0300.
Article MathSciNet MATH Google Scholar
Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Comparative study of visual feature for bimodal Hindi speech recognition. Archives of Acoustics, 40(4), 609–619. https://doi.org/10.1515/aoa-2015-0061.
Article Google Scholar
Vaidyanathan, P. P. (1990). Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial. Proceedings of the IEEE, 78(1), 56–93. https://doi.org/10.1109/5.52200.
Article Google Scholar
Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.
Article Google Scholar
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3.
Article Google Scholar
Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Transactions on Signal Processing, 40(9), 2207–2232. https://doi.org/10.1109/78.157221.
Article MATH Google Scholar
Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.
MATH Google Scholar
Zao, L., Coelho, R., & Flandrin, P. (2014). Speech enhancement with EMD and hurst-based mode selection. IEEE Transactions on Audio, Speech and Language Processing, 22(5), 899–911. https://doi.org/10.1109/TASLP.2014.2312541.
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.

Author information

Authors and Affiliations

Department of Electronics Engineering, Aligarh Muslim University, Aligarh, Uttar Pradesh, India
Prashant Upadhyaya, Omar Farooq & M. R. Abidi

Authors

Prashant Upadhyaya
View author publications
You can also search for this author in PubMed Google Scholar
Omar Farooq
View author publications
You can also search for this author in PubMed Google Scholar
M. R. Abidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prashant Upadhyaya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Upadhyaya, P., Farooq, O. & Abidi, M.R. Mel scaled M-band wavelet filter bank for speech recognition. Int J Speech Technol 21, 797–807 (2018). https://doi.org/10.1007/s10772-018-9545-2

Download citation

Received: 01 March 2018
Accepted: 30 July 2018
Published: 07 August 2018
Issue Date: 15 December 2018
DOI: https://doi.org/10.1007/s10772-018-9545-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mel scaled M-band wavelet filter bank for speech recognition

Abstract

Access this article

Similar content being viewed by others

Robust Perceptual Wavelet Packet Features for Recognition of Continuous Kannada Speech

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mel scaled M-band wavelet filter bank for speech recognition

Abstract

Access this article

Similar content being viewed by others

Robust Perceptual Wavelet Packet Features for Recognition of Continuous Kannada Speech

Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition

Robust Speech Recognition Using Wavelet Domain Front End and Hidden Markov Models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation