Abstract
A Mel scaled M-band wavelet filter bank structure is used to extract the robust acoustic feature for speech recognition application. The proposed filter bank can provide flexibility of frequency partition that decomposes the speech signal into the M-frequency band. To estimate the difference between Mel scaled M-band wavelet and dyadic wavelet filter bank, relative bandwidth deviation (RBD) and root mean square bandwidth deviation (RMSBD) with respect to baseline (Mel filter bank bandwidth) is calculated. Proposed filter bank gives 40.90 and 49.84% reduction for RBD and RMSBD respectively, over 24-dyadic wavelet filter bank. Feature extraction from the proposed filter bank using AMUAV corpus shows an improvement in terms of word recognition accuracy (WRA) at all SNR range (20 dB to 0 dB) over baseline (MFCC) features. For AMUAV corpus, the proposed feature shows the maximum improvement in WRA of 3.93% over baseline features and 3.90% over dyadic wavelet filter bank features. When applied to the VidTIMIT corpus, proposed features show the maximum improvement in WRA of 1.64% over baseline features and 4.43% over dyadic features.
Similar content being viewed by others
References
Abdelnour, A. F. (2002). Wavelet design using grobner basis methods. Ph.D. Dissertation, Department of Electrical Engineering, Polytechnic University, Brooklyn, New York.
Adeli, H., Zhou, Z., & Dadmehr, N. (2003). Analysis of EEG records in an epileptic patient using wavelet transform. Journal of Neuroscience Methods, 123(1), 69–87. https://doi.org/10.1016/S0165-0270(02)00340-0.
Aggarwal, R. K., & Dave, M. (2013). Performance evaluation of sequentially combined heterogeneous feature streams for Hindi speech recognition system. Telecommunication Systems, 52(3), 1457–1466. https://doi.org/10.1007/s11235-011-9623-0.
Bhati, D., Sharma, M., Pachori, R. B., & Gadre, V. M. (2017). Time–frequency localized three-band biorthogonal wavelet filter bank using semidefinite relaxation and nonlinear least squares with epileptic seizure EEG signal classification. Digital Signal Processing, 62, 259–273. https://doi.org/10.1016/J.DSP.2016.12.004.
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2014a). Feature extraction technique using ERB like wavelet sub-band periodic and aperiodic decomposition for TIMIT phoneme recognition. International Journal of Speech Technology, 17(4), 389–399. https://doi.org/10.1007/s10772-014-9236-6.
Biswas, A., Sahu, P. K., Bhowmick, A., & Chandra, M. (2015). Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature. Computers & Electrical Engineering, 42, 12–22. https://doi.org/10.1016/J.COMPELECENG.2014.12.017.
Biswas, A., Sahu, P. K., & Chandra, M. (2014b). Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition. Computers & Electrical Engineering, 40(4), 1111–1122. https://doi.org/10.1016/J.COMPELECENG.2014.01.008.
Biswas, A., Sahu, P. K., & Chandra, M. (2016). Admissible wavelet packet sub-band based harmonic energy features using ANOVA fusion techniques for Hindi phoneme recognition. IET Signal Processing, 10(8), 902–911. https://doi.org/10.1049/iet-spr.2015.0488.
Bouguelia, M.-R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319. https://doi.org/10.1007/s13042-017-0645-0.
Chiu, C.-C., Chuang, C.-M., & Hsu, C.-Y. (2009). Discrete wavelet transform applied on personal identity verification with ECG signal. International Journal of Wavelets, Multiresolution and Information Processing, 07(03), 341–355. https://doi.org/10.1142/S0219691309002957.
Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. https://doi.org/10.1109/TASSP.1980.1163420.
Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Processing Letters, 8(7), 196–198. https://doi.org/10.1109/97.928676.
Farooq, O., & Datta, S. (2003). Wavelet-based denoising for robust feature extraction for speech recognition. Electronics Letters, 39(1), 163–165. https://doi.org/10.1049/el:20030068.
Farooq, O., & Datta, S. (2005). Wavelet based robust sub-band features for phoneme recognition. Chinese Journal of Electronics, 14(1), 115–118. https://doi.org/10.1049/ip-vis.
Farooq, O., Datta, S., & Shrotriya, M. C. (2010). Wavelet sub-band based temporal features for robust Hindi phoneme recognition. International Journal of Wavelets, Multiresolution and Information Processing, 08(06), 847–859. https://doi.org/10.1142/S0219691310003845.
Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM (pp. 191–194).
Grigoryan, A. M. (2005). Fourier transform representation by frequency-time wavelets. IEEE Transactions on Signal Processing, 53(7), 2489–2497. https://doi.org/10.1109/TSP.2005.849180.
Jyothi, P., & Hasegawa-Johnson, M. (2015). Improved Hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3164–3168).
Kim, C., & Stern, R. M. (2012). Power-normalized cepstral coefficients (PNCC) for robust speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4101–4104). IEEE. https://doi.org/10.1109/ICASSP.2012.6288820.
Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25–32. https://doi.org/10.1504/IJCSYSE.2012.044740.
Li, J., Deng, L., Gong, Y., & Haeb-Umbach, R. (2014). An overview of noise-robust automatic speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(4), 745–777. https://doi.org/10.1109/TASLP.2014.2304637.
Lin, T., Hao, P., & Xu, S. (2006a). Matrix factorizations for reversible integer implementation of orthonormal M-band wavelet transforms. Signal Processing, 86(8), 2085–2093. https://doi.org/10.1016/J.SIGPRO.2005.10.015.
Lin, T., Xu, S., Shi, Q., & Hao, P. (2006b). An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation, 172(2), 717–730. https://doi.org/10.1016/j.amc.2004.11.025.
Long, C. (1999). Wavelet methods in speech recognition. PhD thesis, Loughborough University, Department of Electronic and Electrical Engineering, Loughborough University.
Long, C., & Datta, S. (1996a). Wavelet based feature extraction for phoneme recognition. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
Long, C. J. J., & Datta, S. (1996b). Wavelet based feature extraction for phoneme recognition. In ICSLP 96: Fourth international conference on spoken language (Vol. 1, pp. 264–267). IEEE. https://doi.org/10.1109/ICSLP.1996.607095.
Mallat, S. A. (2008). A wavelet tour of signal processing the sparse way (3rd ed.). Academic press.
Mishra, A. N., Chandra, M., Biswas, A., & Sharan, S. N. (2013). Hindi phoneme-viseme recognition from continuous speech. International Journal of Signal and Imaging Systems Engineering, 6(3), 164. https://doi.org/10.1504/IJSISE.2013.054793.
Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6.
Munoz, A., Ertlé, R., & Unser, M. (2002). Continuous wavelet transform with arbitrary scales and O(N) complexity. Signal Processing, 82(5), 749–757. https://doi.org/10.1016/S0165-1684(02)00140-8.
Ocak, H. (2009). Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy. Expert Systems with Applications, 36(2), 2027–2036. https://doi.org/10.1016/J.ESWA.2007.12.065.
Pollock, S., & Cascio, IL (2007). Non-dyadic wavelet analysis. In Optimisation, econometric and financial analysis (pp. 167–203). Berlin: Springer. https://doi.org/10.1007/3-540-36626-1_9.
Rajoub, B., Alshamali, A., & Al-Fahoum, A. S. (2002). An efficient coding algorithm for the compression of ECG signals using the wavelet transform. IEEE Transactions on Biomedical Engineering, 49(4), 355–362. https://doi.org/10.1109/10.991163.
Rioul, O., & Duhamel, P. (1992). Fast algorithms for discrete and continuous wavelet transforms. IEEE Transactions on Information Theory, 38(2), 569–586. https://doi.org/10.1109/18.119724.
Rioul, O., & Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, 8(4), 14–38. https://doi.org/10.1109/79.91217.
Sanderson, C., & Lovell, B. C. (2009). Multi-region probabilistic histograms for robust and scalable identity inference. In Lecture notes in computer science (Vol. 5558, pp. 199–208). Berlin: Springer. https://doi.org/10.1007/978-3-642-01793-3_21.
Shui, P., & Bao, Z. (2004). M-band biorthogonal interpolating wavelets via lifting scheme. IEEE Transactions on Signal Processing, 52(9), 2500–2512.
Steffen, P., Heller, P. N., Gopinath, R. A., & Burrus, C. S. (1993). Theory of regular M-band wavelet bases. IEEE Transactions on Signal Processing, 41(12), 3497–3511. https://doi.org/10.1109/78.258088.
Tabibian, S., Akbari, A., & Nasersharif, B. (2015). Speech enhancement using a wavelet thresholding method based on symmetric Kullback–Leibler divergence. Signal Processing, 106, 184–197. https://doi.org/10.1016/J.SIGPRO.2014.06.027.
Tian, J., & Wells, R. O. (1998). A fast implementation of wavelet transform for m-band filter banks. In Proceedings of SPIE wavelet applications V (Vol. 3391, pp. 534–545).
Tian, J., & Wells, R. O. (2000). An algebraic structure of orthogonal wavelet space. Applied and Computational Harmonic Analysis, 8(3), 223–248. https://doi.org/10.1006/acha.2000.0300.
Upadhyaya, P., Farooq, O., Abidi, M. R., & Varshney, P. (2015). Comparative study of visual feature for bimodal Hindi speech recognition. Archives of Acoustics, 40(4), 609–619. https://doi.org/10.1515/aoa-2015-0061.
Vaidyanathan, P. P. (1990). Multirate digital filters, filter banks, polyphase networks, and applications: A tutorial. Proceedings of the IEEE, 78(1), 56–93. https://doi.org/10.1109/5.52200.
Vaidyanathan, P. P., & Hoang, P. (1988). Lattice structures for optimal design and robust implementation of two-channel perfect-reconstruction QMF banks. IEEE Transactions on Acoustics. Speech, and Signal Processing, 36(I), 81–92.
Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251. https://doi.org/10.1016/0167-6393(93)90095-3.
Vetterli, M., & Herley, C. (1992). Wavelets and filter banks: Theory and design. IEEE Transactions on Signal Processing, 40(9), 2207–2232. https://doi.org/10.1109/78.157221.
Vetterli, M., & Kovačević, J. (1995). Wavelets and subband coding. Book (2nd Ed.). Englewood Cliffs: Prentice Hall PTR.
Zao, L., Coelho, R., & Flandrin, P. (2014). Speech enhancement with EMD and hurst-based mode selection. IEEE Transactions on Audio, Speech and Language Processing, 22(5), 899–911. https://doi.org/10.1109/TASLP.2014.2312541.
Acknowledgements
The authors would like to acknowledge Institution of Electronics and Telecommunication Engineers (IETE) for sponsoring the research fellowship during this period of research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Upadhyaya, P., Farooq, O. & Abidi, M.R. Mel scaled M-band wavelet filter bank for speech recognition. Int J Speech Technol 21, 797–807 (2018). https://doi.org/10.1007/s10772-018-9545-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-018-9545-2