Abstract
The wavelet transform possesses multi-resolution property and high localization performance; hence, it can be optimized for speech recognition. In our previous work, we show that redundant wavelet filter bank parameters work better in speech recognition task, because they are much less shift sensitive than those of critically sampled discrete wavelet transform (DWT). In this paper, three types of wavelet representations are introduced, including features based on dual-tree complex wavelet transform (DT-CWT), perceptual dual-tree complex wavelet transform, and four-channel double-density discrete wavelet transform (FCDDDWT). Then, appropriate filter values for DT-CWT and FCDDDWT are proposed. The performances of the proposed wavelet representations are compared in a phoneme recognition task using special form of the time-delay neural networks. Performance evaluations confirm that dual-tree complex wavelet filter banks outperform conventional DWT in speech recognition systems. The proposed perceptual dual-tree complex wavelet filter bank results in up to approximately 9.82 % recognition rate increase, compared to the critically sampled two-channel wavelet filter bank.


Similar content being viewed by others
References
Rahiminejad, M.: Improvement on Representation Parameters Extraction Methods in Speech Recognition Systems. M.Sc. Thesis. Department of Biomedical Engineering, Amirkabir University of Technology, Tehran (in Persian), (2002)
Tohidypour, H.R., Seyyedsalahi, S.A., Behbood, H., Roshandel, H.: A new representation for speech frame recognition based on redundant wavelet filter banks. Speech Commun. 54(2), 256–271 (2012)
Tohidypour, H.R., Seyyedsalehi, S.A., Roshandel, H., Behbood, H.: Speech recognition using three channel redundant wavelet filterbank. In: 2nd International Conference on Industrial Mechatronics and Automation (ICIMA), vol. 2, pp. 325–328, Wuhan, China, May (2010)
Erzin, E., Cetin, A.E., Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 417–420, Detroit (1995)
Sarikaya, R., Pellom, B.L., Hansen, J.H.: Wavelet packet transform features with application to speaker identification. In: Proceedings of IEEE Nordic Signal Processing Symp (NORSIG’98), pp. 81–84 (1998)
Sarikaya, R., Gowdy, J.N.: Subband based classification of speech under stress. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 569–572 (1998)
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)
Farooq, O., Datta, S.: Mel filter-like admissible wavelet packet structure for speech recognition. IEEE Signal Process. Lett. 8(7), 196–198 (2001)
Tufekci, Z., Gowdy, J.N., Gurbuz, S., Patterson, E.: Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise—robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)
Gowdy, J.N., Tufekci, Z.: Mel-scaled discrete wavelet coefficients for speech recognition. In: Proceedings of ICASSP, vol. 3, pp. 1351–1354, Istanbul (2000)
Pinter, I.: Perceptual wavelet-representation of speech signals and its application to speech enhancement. Comput. Speech Lang. 10(1), 1–22 (1996)
Xun, S., Du, L., Howng, W.: Wavelet linear prediction vocoder based on auditory model. In: Proceedings of ICSP ’98, Fourth International Conference on Signal Processing Proceedings, vol. 1, pp. 595–598, Beijing (1998)
Zhang, X., Bai, J., Liang, W.: The speech recognition based on the bark wavelet and CZCPA features. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 318–321, Beijing (2006)
Tohidypour, H.R., Seyyedsalehi, S.A., Behbood, H.: Comparison between wavelet packet transform, Bark Wavelet and MFCC for robust speech recognition tasks. In: Proceedings of International Conference on Industrial Mechatronics and Automation, pp. 329–332, Wuhan, China, May (2010)
Abdelnour, A.F., Selesnick, I.W.: Symmetric nearly shift invariant tight frame wavelets. IEEE Trans. Signal Process. 53(1), 231–239 (2005)
Selesnick, I.W.: A higher-density discrete wavelet transform. IEEE Trans. Signal Process. 54(8), 3039–3048 (2006)
Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet transform. IEEE Signal Process. Mag. 22(6), 123–151 (2005)
Selesnick, I.W.: The design of Hilbert transform pairs of wavelet bases via the flat delay filter. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 6, pp. 3673–3676, Salt Lake City (2001)
Shao, Y., Chang, C.H.: A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Trans. Syst. Man Cybern. Part B 37(4), 877–889 (2007)
Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C., Tebyani, M.: FARSDAT—the speech database of farsi spoken language. In: Proceedings of Speech Science and Technology Conference, pp. 826–831, Perth (1994)
Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, vol. 1, pp. 532–535, Glasgow (1989)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tohidypour, H.R., Banitalebi-Dehkordi, A. Speech frame recognition based on less shift sensitive wavelet filter banks. SIViP 10, 633–637 (2016). https://doi.org/10.1007/s11760-015-0787-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-015-0787-z