Conv-transformer-based Jaya Gazelle optimization for speech intelligibility with aphasia

Rajendran, Ranjith; Chandrasekar, Arumugam

doi:10.1007/s11760-023-02844-0

Conv-transformer-based Jaya Gazelle optimization for speech intelligibility with aphasia

Original Paper
Published: 26 December 2023

Volume 18, pages 2079–2094, (2024)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Ranjith Rajendran¹ &
Arumugam Chandrasekar¹

228 Accesses
Explore all metrics

Abstract

Individual speech impairment damages a specific region of the brain, which is the main cause of aphasia. The goal is to develop a method, namely Jaya Gazelle algorithm-conv-transformer transducer + deep residual network (JGA_CTT+DRN), for speech intelligibility with aphasia. The input voice signal is first exposed to signal preprocessing using a Gaussian filter. The preprocessed output is given to the feature extraction phase, where specific features like the zero-crossing rate, spectral roll-off, spectral centroid, MFCC-Mel-frequency cepstral coefficients, probability of voicing, linear prediction cepstral coefficients (LPCC), chromogram, empirical mode decomposition (EMD), and statistical features, namely energy and entropy, are extracted. Nonlinear spectral subtraction is then applied for voice enhancement. Following that, voice recognition is performed using a CTT, and the training process proceeds using Jaya Gazelle optimization (JGO), which is created by fusing the Jaya algorithm and the Gazelle optimization algorithm (GOA). Finally, speech is transformed into text when the language and pronunciation model have been developed. Moreover, developed JGA_CTT+DRN is evaluated for its performance by three metrics like positive predictive value (PPV), recognition accuracy and negative predictive value (NPV), with higher values of 92%, 93% and 91.9%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

Article 21 February 2023

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Article 18 February 2020

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Article 16 June 2024

Data availability

The TalkBank dataset was taken from “https://talkbank.org/DB/”, accessed on July 2023.

References

Cherney, L.R., Halper, A.S., Holland, A.L., Cole, R.: Computerized script training for aphasia. Preliminary results (2008)
Le, D., Licata, K., Persad, C., Provost, E.M.: Automatic assessment of speech intelligibility for individuals with aphasia. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2187–2199 (2016)
Article Google Scholar
Engelter, S.T., Gostynski, M., Papa, S., Frei, M., Born, C., Ajdacic-Gross, V., Gutzwiller, F., Lyrer, P.A.: Epidemiology of aphasia attributable to first ischemic stroke: incidence, severity, fluency, etiology, and thrombolysis. Stroke 37(6), 1379–1384 (2006)
Article PubMed Google Scholar
Thomas, S.A., Lincoln, N.B.: Predictors of emotional distress after stroke. Stroke 39(4), 1240–1245 (2008)
Article PubMed Google Scholar
Shinn, P., Blumstein, S.E.: Phonetic disintegration in aphasia: acoustic analysis of spectral characteristics for the place of articulation. Brain Lang. 20(1), 90–114 (1983)
Article CAS PubMed Google Scholar
Simmons-Mackie, N., Raymer, A., Armstrong, E., Holland, A., Cherney, L.R.: Communication partner training in aphasia: a systematic review. Arch. Phys. Med. Rehabil. 91(12), 1814–1837 (2010)
Article PubMed Google Scholar
Mahmoud, S.S., Kumar, A., Tang, Y., Li, Y., Gu, X., Fu, J., Fang, Q.: An efficient deep learning-based method for speech assessment of mandarin-speaking aphasic patients. IEEE J. Biomed. Health Inform. 24(11), 3191–3202 (2020)
Article PubMed Google Scholar
Landa, S., Pennington, L., Miller, N., Robson, S., Thompson, V., Steen, N.: Association between objective measurement of the speech intelligibility of young people with dysarthria and listener ratings of ease of understanding. Int. J. Speech Lang. Pathol. 16(4), 408–416 (2014)
Article PubMed Google Scholar
Janbakhshi, P., Kodrasi, I., Bourlard, H.: Pathological speech intelligibility assessment based on the short-time objective intelligibility measure. In: The proceeding of ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 6405–6409, May (2019)
Torre, I.G., Romero, M., Álvarez, A.: Improving aphasic speech recognition by using novel semi-supervised learning methods on aphasia bank for English and Spanish. Appl. Sci. 11(19), 8872 (2021)
Article CAS Google Scholar
Sandhya, P., Spoorthy, V., Koolagudi, S.G., Sobhana, N.V.: Spectral features for emotional speaker recognition. In: Proceedings of 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), IEEE, pp. 1–6 December (2020)
Qin, Y., Lee, T., Kong, A.P.H.: Automatic assessment of speech impairment in Cantonese-speaking people with aphasia. IEEE J. Sel. Top. Signal process. 14(11), 331–345 (2019)
ADS PubMed PubMed Central Google Scholar
Qin, Y., Lee, T., Feng, S., Kong, A.P.H.: Automatic speech assessment for people with aphasia using TDNN-BLSTM with multi-task learning. In: Interspeech, pp. 3418–3422, September (2018)
Gnanamanickam, J., Natarajan, Y., Sri Preethaa, K.R.: A hybrid speech enhancement algorithm for voice assistance application. Sensors 21(21), 7025 (2021)
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2014)
Article CAS Google Scholar
Mahmoud, S.S., Kumar, A., Li, Y., Tang, Y., Fang, Q.: Performance evaluation of machine learning frameworks for aphasia assessment. Sensors 21(8), 2582 (2021)
Article ADS PubMed PubMed Central Google Scholar
Herath, H.M.D.P.M., Weraniyagoda, W.A.S.A., Rajapaksha, R.T.M., Wijesekara, P.A.D.S.N., Sudheera, K.L.K., Chong, P.H.J.: Automatic assessment of aphasic speech sensed by audio sensors for classification into aphasia severity levels to recommend speech therapies. Sensors 22(18), 6966 (2022)
Qin, Y., Wu, Y., Lee, T., Kong, A.P.H.: An end-to-end approach to automatic speech assessment for Cantonese-speaking people with aphasia. J. Signal Process. Syst. 92, 819–830 (2020)
Article Google Scholar
Korkmaz, Y., Boyaci, A.: Hybrid voice activity detection system based on LSTM and auditory speech features. Biomed. Signal Process. Control 80(2), 104408 (2023)
Korkmaz, Y., Boyaci, A.: A Comprehensive Turkish accent/dialect recognition system using acoustic perceptual formants. Appl. Acoust. 193, 108761 (2022)
YunusKorkmaz and Aytug Boyaci, “Analysis of speaker's gender effects in voice onset time of Turkish stop consonants,” 2018 6th International Symposium on Digital Forensic and Security (ISDFS), pp. 1–5, March 2018.
The Talkbank dataset was taken from “https://talkbank.org/DB/”, accessed on July 2023.
Kopparapu, S.K., Satish, M.: Identifying optimal Gaussian filter for Gaussian noise removal. In: 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 126–129, IEEE, December (2011)
Abdulaziz, Y., Ahmad, S.M.S.: Infant cry recognition system: a comparison of system performance based on mel frequency and linear prediction cepstral coefficients. In: 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), pp. 260–263, IEEE, March (2010)
Karan, B., Sahu, S.S., Mahto, K.: Parkinson disease prediction using intrinsic mode function-based features from speech signal. Biocybern. Biomed. Eng. 40(1), 249–264 (2020)
Article Google Scholar
Huang, W., Hu, W., Yeung, Y.T. and Chen, X, "Conv-transformer transducer: low latency, low frame rate, streamable end-to-end speech recognition," arXiv preprint arXiv:2008.05750, 2020.
Rao, R.: Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016)
Google Scholar
Agushaka, J.O., Ezugwu, A.E., Abualigah, L.: Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput. Appl. 35(5), 4099–4131 (2023)
Article Google Scholar
Akita, Y., Kawahara, T.: Statistical transformation of language and pronunciation models for spontaneous speech recognition. IEEE Trans. Audio Speech Lang. Process. 18(6), 1539–1549 (2009)
Article Google Scholar
Chen, Z., Chen, Y., Wu, L., Cheng, S., Lin, P.: Deep residual network-based fault detection and diagnosis of photovoltaic arrays using current-voltage curves and ambient conditions. Energy Convers. Manag. 198, 111793 (2019)
Article Google Scholar
Nguyen, P., Tran, D., Huang, X., Sharma, D.: A proposed feature extraction method for EEG-based person identification. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), pp. 1 (2012)
Ranjith, R., Chandrasekar, A.: GTSO: gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia. Comput. Speech Lang. 2023, 101568 (2023, in press), Journal Pre-proof
Brammya, G., Praveena, S., NinuPreetha, N.S., Ramya, R., Rajakumar, B.R., Binu, D.: Deer hunting optimization algorithm: a new nature-inspired meta-heuristic paradigm. Comput. J, bxy133 (2019)
Kaveh, A., Zaerreza, A., Hosseini, S.M.: An enhanced shuffled Shepherd Optimization Algorithm for optimal design of large-scale space structures. Eng. Comput. 1–22 (2021)

Download references

Acknowledgements

This work was supported by the AICTE, Government of India through Research Promotion Scheme File No.8-100/FDC/RPS/POLICY-1/ 2021-22.

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, St. Joseph’s College of Engineering, OMR, Chennai 119, India
Ranjith Rajendran & Arumugam Chandrasekar

Authors

Ranjith Rajendran
View author publications
You can also search for this author inPubMed Google Scholar
Arumugam Chandrasekar
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors have made substantial contributions to the conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Ranjith Rajendran.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rajendran, R., Chandrasekar, A. Conv-transformer-based Jaya Gazelle optimization for speech intelligibility with aphasia. SIViP 18, 2079–2094 (2024). https://doi.org/10.1007/s11760-023-02844-0

Download citation

Received: 07 August 2023
Revised: 20 September 2023
Accepted: 13 October 2023
Published: 26 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11760-023-02844-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conv-transformer-based Jaya Gazelle optimization for speech intelligibility with aphasia

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An approach for speech enhancement with dysarthric speech recognition using optimization based machine learning frameworks

An End-to-End Approach to Automatic Speech Assessment for Cantonese-speaking People with Aphasia

Modeling Source and System Features Through Multi-channel Convolutional Neural Network for Improving Intelligibility Assessment of Dysarthric Speech

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now