Abstract
Several research studies are conducted to support the diagnosis of certain disorders. Depression, Parkinson's disease and dysphonia are such disorders which can manifest in speech. This provides a non-invasive and rapid method to support/confirm the diagnosis. Knowledge-based acoustic features are heavily researched for each disorder. However, the importance and quantity of these features are still open questions. Moreover, this feature-engineering procedure can be time-consuming and may require more effort for analysis. Therefore, it is a state-of-art approach to use the feature extraction part of an out-of-domain speech recognition system for feature extraction. In our research, x-vector and ECAPA pre-trained models were used to derive feature vectors. Binary and multiclass classification were conducted using Support Vector Machines. Nested cross validation method was applied for cost and gamma parameter selection. Our results pointed out that disorders can be recognized with similar accuracy using pre-trained feature extractors as with knowledge-based features in the case of binary classification. This highlights the opportunity to omit feature engineering for every disorder but use the same out-of-domain feature extractor for classification. On the other hand, with four-class classification better results were achieved than in our previous research where knowledge-based features were used. This supports the idea of robust discrimination between disorders.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Robin, J., Harrison, J.E., Kaufman, L.D., Rudzicz, F., Simpson, W., Yancheva, M.: Evaluation of speech-based digital biomarkers: review and recommendations. Digital Biomarkers 4(3), 99–108 (2020). https://doi.org/10.1159/000510820
Ramanarayanan, V., Lammert, A.C., Rowe, H.P., Quatieri, T.F., Green, J.R.: Speech as a biomarker: opportunities, interpretability, and challenges. Perspect. ASHA Spec. Interest Groups 7(1), 276–283 (2022)
Pompili, A., et al.: Automatic detection of parkinson’s disease: an experimental analysis of common speech production tasks used for diagnosis. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 411–419. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_46
Liu, Y., Lee, T., Ching, P.C., Law, T.K., Lee, K.Y.: Acoustic assessment of disordered voice with continuous speech based on utterance-level ASR posterior features. IEEE/ACM Trans. Audio, Speech, Lang. Process. 27(6), 1047–1059 (2019)
Vadovsky, M., Paralic, J.: Parkinson’s disease patients classification based on the speech signals. In: 2017 IEEE 15th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 321–326. Herl’any, Slovakia (2017)
Rejaibi, E., Komaty, A., Meriaudeau, F., Agrebi, S., Othmani, A.: MFCC-based recurrent neural network for automatic clinical depression recognition and assessment from speech. Biomed. Signal Process. Control 71, 103107 (2022). https://doi.org/10.1016/j.bspc.2021.103107
Balestrino, R., Schapira, A.H.V.: Parkinson disease. Eur. J. Neurol. 27(1), 27–42 (2020). https://doi.org/10.1111/ene.14108
Mathieson, L.: Green and Mathieson’s the Voice & its Disorders. Whurr Publishers (2001)
Jenei, A.Z., Kiss, G., Tulics, M.G., Sztahó, D.: Separation of several illnesses using correlation structures with convolutional neural networks. Acta Polytech. Hung. 18(7), 47–66 (2021). https://doi.org/10.12700/APH.18.7.2021.7.3
Sztahó, D., et al.: Automatic separation of various disease types by correlation structure of time shifted speech features. In: 2018 41st International Conference on Telecommunications and Signal Processing (TSP). IEEE, pp. 1–4. Greece, Athens (2018)
Sztahó, D., Kiss, G., Tulics, M. G., Vicsi, K.: Automatic discrimination of several types of speech pathologies. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), pp. 1–6. IEEE, Timisoara, Romania (2019)
Sztahó, D., Gábor, K., Miklós, G.T.: Deep learning solution for pathological voice detection using LSTM-based autoencoder hybrid with multi-task learning. In: 14th International Conference on Bio-Inspired Systems and Signal Processing (BIOSIGNALS), pp. 135–141. Vienna, Austria (2021)
Patil, M., Wadhai, V.: Selection of classifiers for depression detection using acoustic features. In: 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA), pp. 1–4. Nagpur, India (2021)
Verde, L., et al.: A lightweight machine learning approach to detect depression from speech analysis. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 330–335. Washington, DC, USA (2021)
Braga, D., Madureira, A.M., Coelho, L., Ajith, R.: Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng. Appl. Artif. Intell. 7, 148–158 (2019)
Umapathy, S., Rachel, S., Thulasi, R.: Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers. Int. J. Speech Technol. 21(1), 9–18 (2017). https://doi.org/10.1007/s10772-017-9471-8
Harati, A., et al.: Speech-based depression prediction using encoder-weight-only transfer learning and a large corpus. In: 2021 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), pp. 7273–7277. ON, Canada, Toronto (2021)
Botelho, C., Teixeira, F., Rolland, T., Abad, A., Trancoso, I.: Pathological speech detection using x-vector embeddings. arXiv preprint arXiv:2003.00864 (2020)
Egas-López, J.V., Kiss, G., Sztahó, D., Gosztolya, G.: Automatic assessment of the degree of clinical depression from speech using X-vectors. In: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8502–8506. Singapore (2022)
Jeancolas, L., et al.: X-Vectors: new quantitative biomarkers for early Parkinson’s disease detection from speech. Front. Neuroinform. 15, 578369 (2021)
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Interspeech 2017, pp. 999–1003. Stockholm, Sweden (2017)
Desplanques, B., Thienpondt, J., Demuynck, K.: Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143. (2020)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20(3), 273–297 (1995)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intel. Syst. Technol. (TIST) 2(3), 1–27 (2011)
Gosztolya, G., Vincze, V., Tóth, L., Pákáski, M., Kálmán, J., Hoffmann, I.: Identifying mild cognitive impairment and mild Alzheimer’s disease based on spontaneous speech using ASR and linguistic features. Comput. Speech Lang. 53, 181–197 (2019)
Beck, A.T., Steer, R.A., Ball, R., Ranieri, W.F.: Comparison of beck depression inventories -IA and -II in psychiatric outpatients. J. Pers. Assess. 67(3), 588–597 (1996)
Hoehn, M., Yahr, M.D.: Parkinsonism onset, progression, and mortality. Neurology 17(5), 427–442 (1967)
Gaber, A.G.H., Liang, F.-Y., Yang, J.-S., Wang, Y.-J., Zheng, Y.-Q.: Correlation among the dysphonia severity index (DSI), the RBH voice perceptual evaluation, and minimum glottal area in female patients with vocal fold nodules. J. Voice 28(1), 20–23 (2014)
Chung, J. S., Nagrani, A., Zisserman, A.: VoxCeleb2: deep speaker recognition. In: Proceedings of the Interspeech 2018, pp. 1086–1090. Hyderabad, India (2018)
Acknowledgements
The work was funded by project no. K128568 that has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under the K_18 funding scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Jenei, A.Z., Kiss, G., Sztahó, D. (2022). Detection of Speech Related Disorders by Pre-trained Embedding Models Extracted Biomarkers. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)