Abstract
Individual speech impairment damages a specific region of the brain, which is the main cause of aphasia. The goal is to develop a method, namely Jaya Gazelle algorithm-conv-transformer transducer + deep residual network (JGA_CTT+DRN), for speech intelligibility with aphasia. The input voice signal is first exposed to signal preprocessing using a Gaussian filter. The preprocessed output is given to the feature extraction phase, where specific features like the zero-crossing rate, spectral roll-off, spectral centroid, MFCC-Mel-frequency cepstral coefficients, probability of voicing, linear prediction cepstral coefficients (LPCC), chromogram, empirical mode decomposition (EMD), and statistical features, namely energy and entropy, are extracted. Nonlinear spectral subtraction is then applied for voice enhancement. Following that, voice recognition is performed using a CTT, and the training process proceeds using Jaya Gazelle optimization (JGO), which is created by fusing the Jaya algorithm and the Gazelle optimization algorithm (GOA). Finally, speech is transformed into text when the language and pronunciation model have been developed. Moreover, developed JGA_CTT+DRN is evaluated for its performance by three metrics like positive predictive value (PPV), recognition accuracy and negative predictive value (NPV), with higher values of 92%, 93% and 91.9%.









Similar content being viewed by others
Data availability
The TalkBank dataset was taken from “https://talkbank.org/DB/”, accessed on July 2023.
References
Cherney, L.R., Halper, A.S., Holland, A.L., Cole, R.: Computerized script training for aphasia. Preliminary results (2008)
Le, D., Licata, K., Persad, C., Provost, E.M.: Automatic assessment of speech intelligibility for individuals with aphasia. IEEE/ACM Trans. Audio Speech Lang. Process. 24(11), 2187–2199 (2016)
Engelter, S.T., Gostynski, M., Papa, S., Frei, M., Born, C., Ajdacic-Gross, V., Gutzwiller, F., Lyrer, P.A.: Epidemiology of aphasia attributable to first ischemic stroke: incidence, severity, fluency, etiology, and thrombolysis. Stroke 37(6), 1379–1384 (2006)
Thomas, S.A., Lincoln, N.B.: Predictors of emotional distress after stroke. Stroke 39(4), 1240–1245 (2008)
Shinn, P., Blumstein, S.E.: Phonetic disintegration in aphasia: acoustic analysis of spectral characteristics for the place of articulation. Brain Lang. 20(1), 90–114 (1983)
Simmons-Mackie, N., Raymer, A., Armstrong, E., Holland, A., Cherney, L.R.: Communication partner training in aphasia: a systematic review. Arch. Phys. Med. Rehabil. 91(12), 1814–1837 (2010)
Mahmoud, S.S., Kumar, A., Tang, Y., Li, Y., Gu, X., Fu, J., Fang, Q.: An efficient deep learning-based method for speech assessment of mandarin-speaking aphasic patients. IEEE J. Biomed. Health Inform. 24(11), 3191–3202 (2020)
Landa, S., Pennington, L., Miller, N., Robson, S., Thompson, V., Steen, N.: Association between objective measurement of the speech intelligibility of young people with dysarthria and listener ratings of ease of understanding. Int. J. Speech Lang. Pathol. 16(4), 408–416 (2014)
Janbakhshi, P., Kodrasi, I., Bourlard, H.: Pathological speech intelligibility assessment based on the short-time objective intelligibility measure. In: The proceeding of ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp. 6405–6409, May (2019)
Torre, I.G., Romero, M., Álvarez, A.: Improving aphasic speech recognition by using novel semi-supervised learning methods on aphasia bank for English and Spanish. Appl. Sci. 11(19), 8872 (2021)
Sandhya, P., Spoorthy, V., Koolagudi, S.G., Sobhana, N.V.: Spectral features for emotional speaker recognition. In: Proceedings of 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), IEEE, pp. 1–6 December (2020)
Qin, Y., Lee, T., Kong, A.P.H.: Automatic assessment of speech impairment in Cantonese-speaking people with aphasia. IEEE J. Sel. Top. Signal process. 14(11), 331–345 (2019)
Qin, Y., Lee, T., Feng, S., Kong, A.P.H.: Automatic speech assessment for people with aphasia using TDNN-BLSTM with multi-task learning. In: Interspeech, pp. 3418–3422, September (2018)
Gnanamanickam, J., Natarajan, Y., Sri Preethaa, K.R.: A hybrid speech enhancement algorithm for voice assistance application. Sensors 21(21), 7025 (2021)
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2014)
Mahmoud, S.S., Kumar, A., Li, Y., Tang, Y., Fang, Q.: Performance evaluation of machine learning frameworks for aphasia assessment. Sensors 21(8), 2582 (2021)
Herath, H.M.D.P.M., Weraniyagoda, W.A.S.A., Rajapaksha, R.T.M., Wijesekara, P.A.D.S.N., Sudheera, K.L.K., Chong, P.H.J.: Automatic assessment of aphasic speech sensed by audio sensors for classification into aphasia severity levels to recommend speech therapies. Sensors 22(18), 6966 (2022)
Qin, Y., Wu, Y., Lee, T., Kong, A.P.H.: An end-to-end approach to automatic speech assessment for Cantonese-speaking people with aphasia. J. Signal Process. Syst. 92, 819–830 (2020)
Korkmaz, Y., Boyaci, A.: Hybrid voice activity detection system based on LSTM and auditory speech features. Biomed. Signal Process. Control 80(2), 104408 (2023)
Korkmaz, Y., Boyaci, A.: A Comprehensive Turkish accent/dialect recognition system using acoustic perceptual formants. Appl. Acoust. 193, 108761 (2022)
YunusKorkmaz and Aytug Boyaci, “Analysis of speaker's gender effects in voice onset time of Turkish stop consonants,” 2018 6th International Symposium on Digital Forensic and Security (ISDFS), pp. 1–5, March 2018.
The Talkbank dataset was taken from “https://talkbank.org/DB/”, accessed on July 2023.
Kopparapu, S.K., Satish, M.: Identifying optimal Gaussian filter for Gaussian noise removal. In: 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, pp. 126–129, IEEE, December (2011)
Abdulaziz, Y., Ahmad, S.M.S.: Infant cry recognition system: a comparison of system performance based on mel frequency and linear prediction cepstral coefficients. In: 2010 International Conference on Information Retrieval & Knowledge Management (CAMP), pp. 260–263, IEEE, March (2010)
Karan, B., Sahu, S.S., Mahto, K.: Parkinson disease prediction using intrinsic mode function-based features from speech signal. Biocybern. Biomed. Eng. 40(1), 249–264 (2020)
Huang, W., Hu, W., Yeung, Y.T. and Chen, X, "Conv-transformer transducer: low latency, low frame rate, streamable end-to-end speech recognition," arXiv preprint arXiv:2008.05750, 2020.
Rao, R.: Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int. J. Ind. Eng. Comput. 7(1), 19–34 (2016)
Agushaka, J.O., Ezugwu, A.E., Abualigah, L.: Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer. Neural Comput. Appl. 35(5), 4099–4131 (2023)
Akita, Y., Kawahara, T.: Statistical transformation of language and pronunciation models for spontaneous speech recognition. IEEE Trans. Audio Speech Lang. Process. 18(6), 1539–1549 (2009)
Chen, Z., Chen, Y., Wu, L., Cheng, S., Lin, P.: Deep residual network-based fault detection and diagnosis of photovoltaic arrays using current-voltage curves and ambient conditions. Energy Convers. Manag. 198, 111793 (2019)
Nguyen, P., Tran, D., Huang, X., Sharma, D.: A proposed feature extraction method for EEG-based person identification. In: Proceedings on the International Conference on Artificial Intelligence (ICAI), The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), pp. 1 (2012)
Ranjith, R., Chandrasekar, A.: GTSO: gradient tangent search optimization enabled voice transformer with speech intelligibility for aphasia. Comput. Speech Lang. 2023, 101568 (2023, in press), Journal Pre-proof
Brammya, G., Praveena, S., NinuPreetha, N.S., Ramya, R., Rajakumar, B.R., Binu, D.: Deer hunting optimization algorithm: a new nature-inspired meta-heuristic paradigm. Comput. J, bxy133 (2019)
Kaveh, A., Zaerreza, A., Hosseini, S.M.: An enhanced shuffled Shepherd Optimization Algorithm for optimal design of large-scale space structures. Eng. Comput. 1–22 (2021)
Acknowledgements
This work was supported by the AICTE, Government of India through Research Promotion Scheme File No.8-100/FDC/RPS/POLICY-1/ 2021-22.
Funding
This research did not receive any specific funding.
Author information
Authors and Affiliations
Contributions
All authors have made substantial contributions to the conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rajendran, R., Chandrasekar, A. Conv-transformer-based Jaya Gazelle optimization for speech intelligibility with aphasia. SIViP 18, 2079–2094 (2024). https://doi.org/10.1007/s11760-023-02844-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02844-0