Skip to main content
Log in

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Spectral-based features, typically used in ASR systems, do not capture the phase information of speech signals. Thus, exploiting new features that do not ignore the phase of the signal can be a complementary approach to improve the performance of the feature extraction (FE) block of an ASR system. In this paper, we propose an adaptive FE method that uses the reconstructed phase space (RPS) and recurrence plot (RP) theories as its foundations. The RP transformation can reveal some important aspects of the dynamics of high-dimensional speech trajectories reconstructed in the RPS. In this work, after transforming the speech signal to the image-like RP domain as a matrix, we apply a powerful wavelet-based FE method. We use a two-dimensional adaptive wavelet transform, implemented through a customized filter bank, to extract some beneficial dynamical features from the RP matrix for the ASR task. We evaluate the resulting features in an ASR task alone and in combination with the traditional MFCCs. Using the TIMIT speech corpus, the combination of the proposed and MFCC features results in a relative improvement of 7.79% in phoneme recognition accuracy rate compared to using only the MFCC features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

Employed dataset is referenced in the article.

References

  1. Jiang, J.J., Zhang, Y.: Chaotic vibration induced by turbulent noise in a two-mass model of vocal folds. The J. Acoust. Soc. Am. 112(5), 2127–2133 (2002)

    ADS  PubMed  Google Scholar 

  2. Povinelli, R.J., et al.: Statistical models of reconstructed phase spaces for signal classification. IEEE Trans. Signal Process. 54(6), 2178–2186 (2006)

    ADS  Google Scholar 

  3. Vieira, V.J., et al.: Exploiting nonlinearity of the speech production system for voice disorder assessment by recurrence quantification analysis. Chaos: An Interdiscip. J. Nonlinear Sci. 28(8), 085709 (2018)

    Google Scholar 

  4. Datta, A.K.: Nonlinearity in speech signal. In: Time Domain Representation of Speech Sounds, pp. 131–154. Springer (2018)

    Google Scholar 

  5. Shekofteh, Y., Almasganj, F.: Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes. Digit. Signal Process. 23(6), 1923–1932 (2013)

    Google Scholar 

  6. Shekofteh, Y., Almasganj, F., Daliri, A.: MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space. Eng. Appl. Artif. Intell. 44, 1–9 (2015)

    Google Scholar 

  7. Firooz, S.G., Almasganj, F., Shekofteh, Y.: Improvement of automatic speech recognition systems via nonlinear dynamical features evaluated from the recurrence plot of speech signals. Comput. Electr. Eng. 58, 215–226 (2017)

    Google Scholar 

  8. Jafari, A., Almasganj, F., Bidhendi, M.N.: Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance. Chaos: An Interdiscip. J. Nonlinear Sci. 20(3), 033106 (2010)

    Google Scholar 

  9. Wesley, R.J., Khan, A.N., Shahina, A.: Phoneme classification in reconstructed phase space with convolutional neural networks. Pattern Recogn. Lett. 135, 299–306 (2020)

    ADS  Google Scholar 

  10. Akbari, H., et al.: Schizophrenia recognition based on the phase space dynamic of EEG signals and graphical features. Biomed. Signal Process. Control 69, 102917 (2021)

    Google Scholar 

  11. Johnson, M.T., et al.: Time-domain isolated phoneme classification using reconstructed phase spaces. IEEE Trans. Speech Audio Process. 13(4), 458–466 (2005)

    Google Scholar 

  12. Kokkinos, I., Maragos, P.: Nonlinear speech analysis using models for chaotic systems. IEEE Trans. Speech Audio Process. 13(6), 1098–1109 (2005)

    Google Scholar 

  13. Shekofteh, Y., Almasganj, F.: Feature extraction based on speech attractors in the reconstructed phase space for automatic speech recognition systems. ETRI J. 35(1), 100–108 (2013)

    Google Scholar 

  14. Vaziri, G., Almasganj, F., Behroozmand, R.: Pathological assessment of patients’ speech signals using nonlinear dynamical analysis. Comput. Biol. Med. 40(1), 54–63 (2010)

    PubMed  Google Scholar 

  15. Wallot, S., Mønster, D.: Calculation of average mutual information (AMI) and false-nearest neighbors (FNN) for the estimation of embedding parameters of multidimensional time series in matlab. Front. Psychol. 9, 1679 (2018)

    PubMed  PubMed Central  Google Scholar 

  16. Shekofteh, Y., et al.: Parameter identification of chaotic systems using a modified cost function including static and dynamic information of attractors in the state space. Circ. Syst. Signal Process. 38(5), 2039–2054 (2019)

    Google Scholar 

  17. Marwan, N., et al.: Recurrence plots for the analysis of complex systems. Phys. Rep. 438(5–6), 237–329 (2007)

    ADS  MathSciNet  Google Scholar 

  18. Mathunjwa, B.M., et al.: ECG arrhythmia classification by using a recurrence plot and convolutional neural network. Biomed. Signal Process. Control 64, 102262 (2021)

    Google Scholar 

  19. Saeedi, N.E., Almasganj, F.: Wavelet adaptation for automatic voice disorders sorting. Comput. Biol. Med. 43(6), 699–704 (2013)

    Google Scholar 

  20. Zolfaghari, M., Gholami, S.: A hybrid approach of adaptive wavelet transform, long short-term memory and ARIMA-GARCH family models for the stock index prediction. Expert Syst. Appl. 182, 115149 (2021)

    Google Scholar 

  21. Liu, X., et al.: Adaptive wavelet transform model for time series data prediction. Soft. Comput. 24(8), 5877–5884 (2020)

    Google Scholar 

  22. Qu, H., Li, T., Chen, G.: Adaptive wavelet transform: definition, parameter optimization algorithms, and application for concrete delamination detection from impact echo responses. Struct. Health Monit. 18(4), 1022–1039 (2019)

    Google Scholar 

  23. Whitney, H.: Differentiable manifolds. Ann. Math. 37, 645–680 (1936)

    MathSciNet  Google Scholar 

  24. Takens, F.: Detecting strange attractors in turbulence. In: Dynamical systems and turbulence, Warwick 1980, pp. 366–381. Springer (1981)

    Google Scholar 

  25. Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. J. Stat. Phys. 65(3), 579–616 (1991)

    ADS  MathSciNet  Google Scholar 

  26. Lao, S.-K., et al.: Cost function based on Gaussian mixture model for parameter estimation of a chaotic circuit with a hidden attractor. Int. J. Bifurcation Chaos 24(01), 1450010 (2014)

    ADS  MathSciNet  Google Scholar 

  27. Povinelli, R.J., et al.: Time series classification using Gaussian mixture models of reconstructed phase spaces. IEEE Trans. Knowl. Data Eng. 16(6), 779–783 (2004)

    Google Scholar 

  28. Hirata, Y., et al.: Fast reconstruction of an original continuous series from a recurrence plot. Chaos: An Interdiscip. J. Nonlinear Sci. 31(12), 121101 (2021)

    MathSciNet  Google Scholar 

  29. Marwan, N., et al.: Complex network approach for recurrence analysis of time series. Phys. Lett. A 373(46), 4246–4254 (2009)

    ADS  CAS  Google Scholar 

  30. Hołyst, J., Żebrowska, M., Urbanowicz, K.: Observations of deterministic chaos in financial time series by recurrence plots, can one control chaotic economy? The Eur. Phys. J. B-Condens. Matter Complex Syst. 20(4), 531–535 (2001)

    MathSciNet  Google Scholar 

  31. Webber, C. and Marwan, N.: Recurrence quantification analysis. Theory and Best Practices (2015)

  32. Gao, X., et al.: Automatic detection of epileptic seizure based on approximate entropy, recurrence quantification analysis and convolutional neural networks. Artif. Intell. Med. 102, 101711 (2020)

    PubMed  Google Scholar 

  33. Shih, F.Y.: Image processing and pattern recognition: fundamentals and techniques. John Wiley & Sons (2010)

  34. Coronel, C., et al.: Quantitative EEG markers of entropy and auto mutual information in relation to MMSE scores of probable Alzheimer’s disease patients. Entropy 19(3), 130 (2017)

    ADS  Google Scholar 

  35. Xu, C., et al.: Deep clustering by maximizing mutual information in variational auto-encoder. Knowl.-Based Syst. 205, 106260 (2020)

    Google Scholar 

  36. Lu, T.-C., Grover, T.: Renyi entropy of chaotic eigenstates. Phys. Rev. E 99(3), 032111 (2019)

    ADS  MathSciNet  CAS  PubMed  Google Scholar 

  37. Mallat, S.: A wavelet tour of signal processing. Elsevier (1999)

  38. Cvetkovic, D., Übeyli, E.D., Cosic, I.: Wavelet transform feature extraction from human PPG, ECG, and EEG signal responses to ELF PEMF exposures: A pilot study. Digital signal processing 18(5), 861–874 (2008)

    Google Scholar 

  39. Dibal, P., et al.: Application of wavelet transform in spectrum sensing for cognitive radio: a survey. Phys. Commun. 28, 45–57 (2018)

    Google Scholar 

  40. Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)

    ADS  Google Scholar 

  41. Erzin, E., Cetin, A.E. and Yardimci, Y.: Subband analysis for robust speech recognition in the presence of car noise. in 1995 International Conference on Acoustics, Speech, and Signal Processing. IEEE (1995)

  42. Kim, C.W., Ansari, R. and Çetin, A.E.: A class of linear-phase regular biorthogonal wavelets. in icassp (1992)

  43. Saeedi, N.E., Almasganj, F., Torabinejad, F.: Support vector wavelet adaptation for pathological voice assessment. Comput. Biol. Med. 41(9), 822–828 (2011)

    PubMed  Google Scholar 

  44. Strang, G. and Nguyen, T.: Wavelets and filter banks. SIAM (1996)

  45. Neumann, J., Schnörr, C., Steidl, G.: Efficient wavelet adaptation for hybrid wavelet–large margin classifiers. Pattern Recogn. 38(11), 1815–1830 (2005)

    ADS  Google Scholar 

  46. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)

    Google Scholar 

  47. Kramer, O.: Genetic algorithms. In: Genetic algorithm essentials, pp. 11–19. Springer (2017)

    Google Scholar 

  48. Murthy, Y.S., Koolagudi, S.G.: Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS). Expert Syst. Appl. 106, 77–91 (2018)

    Google Scholar 

  49. Behroozmand, R., Almasganj, F.: Optimal selection of wavelet-packet-based features using genetic algorithm in pathological assessment of patients’ speech signal with unilateral vocal fold paralysis. Comput. Biol. Med. 37(4), 474–485 (2007)

    PubMed  Google Scholar 

  50. Bafroui, H.H., Ohadi, A.: Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions. Neurocomputing 133, 437–445 (2014)

    Google Scholar 

  51. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Google Scholar 

  52. Déjean, S. et al.: Forward and backward feature selection for query performance prediction. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing. (2020)

  53. Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, (1993)

  54. Young, S., et al.: The HTK book. Camb. Univ. Eng. Dep. 3(175), 12 (2002)

    Google Scholar 

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by SF and YS. The first draft of the manuscript was written by SF and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yasser Shekofteh.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Firooz, S., Almasganj, F. & Shekofteh, Y. Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories. SIViP 18, 1959–1967 (2024). https://doi.org/10.1007/s11760-023-02921-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02921-4

Keywords

Navigation