skip to main content
research-article

A Deep Learning Approach for Voice Disorder Detection for Smart Connected Living Environments

Published: 15 October 2021 Publication History

Abstract

Edge Analytics and Artificial Intelligence are important features of the current smart connected living community. In a society where people, homes, cities, and workplaces are simultaneously connected through various devices, primarily through mobile devices, a considerable amount of data is exchanged, and the processing and storage of these data are laborious and difficult tasks. Edge Analytics allows the collection and analysis of such data on mobile devices, such as smartphones and tablets, without involving any cloud-centred architecture that cannot guarantee real-time responsiveness. Meanwhile, Artificial Intelligence techniques can constitute a valid instrument to process data, limiting the computation time, and optimising decisional processes and predictions in several sectors, such as healthcare. Within this field, in this article, an approach able to evaluate the voice quality condition is proposed. A fully automatic algorithm, based on Deep Learning, classifies a voice as healthy or pathological by analysing spectrogram images extracted by means of the recording of vowel /a/, in compliance with the traditional medical protocol. A light Convolutional Neural Network is embedded in a mobile health application in order to provide an instrument capable of assessing voice disorders in a fast, easy, and portable way. Thus, a straightforward mobile device becomes a screening tool useful for the early diagnosis, monitoring, and treatment of voice disorders. The proposed approach has been tested on a broad set of voice samples, not limited to the most common voice diseases but including all the pathologies present in three different databases achieving F1-scores, over the testing set, equal to 80%, 90%, and 73%. Although the proposed network consists of a reduced number of layers, the results are very competitive compared to those of other “cutting edge” approaches constructed using more complex neural networks, and compared to the classic deep neural networks, for example, VGG-16 and ResNet-50.

References

[1]
Ahmed Al-Nasheri, Ghulam Muhammad, Mansour Alsulaiman, Zulfiqar Ali, Khalid H. Malki, Tamer A. Mesallam, and Mohamed Farahat Ibrahim. 2017. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 6 (2017), 6961–6974.
[2]
Ahmed Ali Mohammed Al-Saffar, Hai Tao, and Mohammed Ahmed Talab. 2017. Review of deep convolution neural network in image classification. In 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET'17). IEEE, 26–31.
[3]
Musaed Alhussein and Ghulam Muhammad. 2018. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 6 (2018), 41034–41041.
[4]
Akbar Ali and Sanjay Ganar. 2018. Intelligent pathological voice detection. International Journal of Innovative Research in Technology 5, 5 (2018), 92–95.
[5]
Jefferson S. Almeida, Pedro P. Rebouças Filho, Tiago Carneiro, Wei Wei, Robertas Damaševičius, Rytis Maskeliūnas, and Victor Hugo C. de Albuquerque. 2019. Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques. Pattern Recognition Letters 125 (2019), 55–62.
[6]
Fethi Amara, Mohamed Fezari, and Hocine Bourouba. 2016. An improved GMM-SVM system based on distance metric for voice pathology detection. Applied Mathematics and Information Science 10, 3 (2016), 1061–1070.
[7]
Ofer Amir, Michael Wolf, and Noam Amir. 2007. A clinical comparison between MDVP and Praat softwares: Is there a difference? In Fifth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications. ISCA, Firenze University Press, 37–40.
[8]
Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, and Muhammad Khurram Khan. 2018. Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems 42, 11 (2018), 226.
[9]
Ben Barsties and Marc De Bodt. 2015. Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx 42, 3 (2015), 183–188.
[10]
Paul Boersma and David Weenink. 2009. Praat: Doing phonetics by computer (Version 5.1. 05) [Computer program]. Retrieved August 30, 2020 fromhttps://www.fon.hum.uva.nl/praat/.
[11]
Boyan Boyanov and Stefan Hadjitodorov. 1997. Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. IEEE Engineering in Medicine and Biology Magazine 16, 4 (1997), 74–82.
[12]
Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. A new database of healthy and pathological voices. Computers & Electrical Engineering 68 (2018), 310–321.
[13]
Ugo Cesari, Giuseppe De Pietro, Elio Marciano, Ciro Niri, Giovanna Sannino, and Laura Verde. 2018. VOICED (VOice ICar fEDerico II) Database. PhysioNet. January 30, 2020 https://physionet.org/physiobank/database/voiced/.
[14]
Lili Chen and Junjiang Chen. 2020. Deep neural network for automatic classification of pathological voice signals. Journal of Voice (2020).
[15]
Weiping Ding, Mohamed Abdel-Basset, Khalid A. Eldrandaly, Laila Abdel-Fatah, and Victor Hugo C. de Albuquerque. 2020. Smart supervision of cardiomyopathy based on fuzzy Harris hawks optimizer and wearable sensing data optimization: A new model. IEEE Transactions on Cybernetics (2020), 1–15.
[16]
Carlos M. J. M. Dourado, Suane Pires P. Da Silva, Raul Victor M. Da Nóbrega, Pedro P. Rebouças Filho, Khan Muhammad, and Victor Hugo C. De Albuquerque. 2020. An open IoHT-based deep learning framework for online medical image recognition. IEEE Journal on Selected Areas in Communications (2020).
[17]
Massachusetts Eye and Ear Infirmary. 1994. Elemetrics Disordered Voice Database (Version 1.03).
[18]
Shih-Hau Fang, Yu Tsao, Min-Jing Hsiao, Ji-Ying Chen, Ying-Hui Lai, Feng-Chuan Lin, and Chi-Te Wang. 2019. Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice 33, 5 (2019), 634–641.
[19]
G. Friedrich and P. H. Dejonckere. 2005. The voice evaluation protocol of the European Laryngological Society (ELS)—First results of a multicenter study. Laryngo-rhino-otologie 84, 10 (2005), 744–752.
[20]
Karimollah Hajian-Tilaki. 2013. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine 4, 2 (2013), 627.
[21]
Yixue Hao, Yiming Miao, Long Hu, M. Shamim Hossain, Ghulam Muhammad, and Syed Umar Amin. 2019. Smart-Edge-CoCaCo: AI-enabled smart edge with joint computation, caching, and communication in heterogeneous IoT. IEEE Network 33, 2 (2019), 58–64.
[22]
Pavol Harar, Zoltan Galaz, Jesus B. Alonso-Hernandez, Jiri Mekyska, Radim Burget, and Zdenek Smekal. 2018. Towards robust voice pathology detection. Neural Computing and Applications 32 (2018), 15747–15757.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 770–778.
[24]
Patricia Henríquez, Jesús B. Alonso, Miguel A. Ferrer, Carlos M. Travieso, Juan I. Godino-Llorente, and Fernando Díaz-de María. 2009. Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing 17, 6 (2009), 1186–1195.
[25]
M. Shamim Hossain. 2015. Cloud-supported cyber–physical localization framework for patients monitoring. IEEE Systems Journal 11, 1 (2015), 118–127.
[26]
M. Shamim Hossain, Syed Umar Amin, Mansour Alsulaiman, and Ghulam Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1s (2019), 1–17.
[27]
M. Shamim Hossain, Ghulam Muhammad, and Atif Alamri. 2019. Smart healthcare monitoring: A voice pathology detection paradigm for smart cities. Multimedia Systems 25, 5 (2019), 565–575.
[28]
M. Shamim Hossain, Ghulam Muhammad, and Nadra Guizani. 2020. Explainable AI and mass surveillance system-based healthcare framework to combat COVID-19 like pandemics. IEEE Network 34, 4 (2020), 1–7.
[29]
Rumana Islam, Mohammed Tarique, and Esam Abdel-Raheem. 2020. A survey on signal processing based pathological voice detection techniques. IEEE Access 8 (2020), 66749–66776.
[30]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In 22nd ACM International Conference on Multimedia. ACM, 675–678.
[31]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12), Vol. 1, 1097–1105.
[32]
Ben Barsties v. Latoszek, Nora Ulozaitė-Stanienė, Youri Maryn, Tadas Petrauskas, and Virgilijus Uloza. 2019. The influence of gender and age on the acoustic voice quality index and dysphonia severity index: A normative study. Journal of Voice 33, 3 (2019), 340–345.
[33]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.
[34]
Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60–88.
[35]
Xiaoxuan Liu, Livia Faes, Aditya U. Kale, Siegfried K. Wagner, Dun Jack Fu, Alice Bruynseels, Thushika Mahendiran, Gabriella Moraes, Mohith Shamdas, Christoph Kern, et al. 2019. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. The Lancet Digital Health 1, 6 (2019), e271–e297.
[36]
Leonardo Wanderley Lopes, Layssa Batista Simões, Jocélio Delfino da Silva, Deyverson da Silva Evangelista, Ana Celiane da Nóbrega e Ugulino, Priscila Oliveira Costa Silva, and Vinícius Jefferson Dias Vieira. 2017. Accuracy of acoustic analysis measurements in the evaluation of patients with different laryngeal diagnoses. Journal of Voice 31, 3 (2017), 382.e15–382.e26.
[37]
A. Ricci Maccarini and E. Lucchini. 2002. La valutazione soggettiva ed oggettiva della disfonia. Il protocollo SIFEL. Acta Phoniatrica Latina 24, 1/2 (2002), 13–42.
[38]
Tamer A. Mesallam, Mohamed Farahat, Khalid H. Malki, Mansour Alsulaiman, Zulfiqar Ali, Ahmed Al-Nasheri, and Ghulam Muhammad. 2017. Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. Journal of Healthcare Engineering 2017 (2017), 1–13.
[39]
Mazin Abed Mohammed, Karrar Hameed Abdulkareem, Salama A. Mostafa, Mohd Khanapi Abd Ghani, Mashael S. Maashi, Begonya Garcia-Zapirain, Ibon Oleagordia, Hosam Alhakami, and Fahad Taha AL-Dhief. 2020. Voice pathology detection and classification using convolutional neural network model. Applied Sciences 10, 11 (2020), 3723.
[40]
Ghulam Muhammad, Mohammed F. Alhamid, Mansour Alsulaiman, and Brij Gupta. 2018. Edge computing with cloud for voice disorder assessment and treatment. IEEE Communications Magazine 56, 4 (2018), 60–65.
[41]
Khan Muhammad, Salman Khan, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Deep learning for multigrade brain tumor classification in smart healthcare systems: A prospective survey. IEEE Transactions on Neural Networks and Learning Systems (2020), 1–8.
[42]
Juan Rafael Orozco-Arroyave, Julián David Arias-Londoño, Jesús Francisco Vargas-Bonilla, María Claudia Gonzalez-Rátiva, and Elmar Nöth. 2014. New Spanish speech corpus database for the analysis of people suffering from Parkinson's disease. In Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), 342–347.
[43]
Manfred Pützer and Jacques Koreman. 1997. A German database of patterns of pathological vocal fold vibration. Phonus 3 (1997), 143–153.
[44]
Alice Rueda and Sridhar Krishnan. 2019. Augmenting dysphonia voice using Fourier-based synchrosqueezing transform for a CNN classifier. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19). IEEE, 6415–6419.
[45]
Brahim Sabir, Fatima Rouda, Yassine Khazri, Bouzekri Touri, and Mohamed Moussetad. 2017. Improved algorithm for pathological and normal voices identification. International Journal of Electrical and Computer Engineering 7, 1 (2017), 238.
[46]
Giovanna Sannino, Ivanoe De Falco, and Giuseppe De Pietro. 2018. A continuous noninvasive arterial pressure (CNAP) approach for health 4.0 systems. IEEE Transactions on Industrial Informatics 15, 1 (2018), 498–506.
[47]
Marcus A. G. Santos, Roberto Munoz, Rodrigo Olivares, Pedro P. Rebouças Filho, Javier Del Ser, and Victor Hugo C. de Albuquerque. 2020. Online heart monitoring systems on the internet of health things environments: A survey, a reference model and an outlook. Information Fusion 53 (2020), 222–239.
[48]
R. T. Sataloff, K. M. Kost, and S. E. Linville. 2005. The effects of age on the voice. In Vocal Health and Pedagogy - Science, Assessment and Treatment (3rd Edition), R. T. Sataloff (Ed.). Plural Publishing, Inc., San Diego, 319–338.
[49]
Valson Sheyona and Usha Devadas. 2020. The prevalence and impact of voice problems in nonprofessional voice users: Preliminary findings. Journal of Voice (2020).
[50]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR'15). 1–14.
[51]
Joseph R. Spiegel, Robert Thayer Sataloff, and Kate A. Emerich. 1997. The young adult voice. Journal of Voice 11, 2 (1997), 138–143.
[52]
Zoë Thijs, Kristie Knickerbocker, and Christopher R. Watts. 2020. Epidemiological patterns and treatment outcomes in a private practice community voice clinic. Journal of Voice (2020).
[53]
Trinh Nam and Darragh O'Brien. 2019. Pathological speech classification using a convolutional neural network. In Irish Machine Vision and Image Processing Conference (IMVIP'19). Technological University Dublin, Dublin, Ireland, 72–75.
[54]
Laura Verde, Giuseppe De Pietro, and Giovanna Sannino. 2018. Voice disorder identification by using machine learning techniques. IEEE Access 6 (2018), 16246–16255.
[55]
Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. Convolutional neural networks for pathological voice detection. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'18). IEEE, 1–4.
[56]
Huiyi Wu, John Soraghan, Anja Lowit, and Gaetano Di Caterina. 2018. A deep learning method for pathological voice detection using convolutional deep belief networks. In Interspeech 2018. ISCA, 446–450.

Cited By

View all
  • (2024)Automatic Speech and Voice Disorder Detection Using Deep Learning—A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.337171312(49667-49681)Online publication date: 2024
  • (2024)Improving Voice Pathology Classification Using Artificial Data GenerationProcedia Computer Science10.1016/j.procs.2024.09.612246(5175-5184)Online publication date: 2024
  • (2024)Diagnosis of pathological speech with streamlined features for long short-term memory learningComputers in Biology and Medicine10.1016/j.compbiomed.2024.107976170(107976)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 22, Issue 1
February 2022
717 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3483347
  • Editor:
  • Ling Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021
Accepted: 01 November 2020
Revised: 01 October 2020
Received: 01 August 2020
Published in TOIT Volume 22, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Voice analysis
  2. voice classification
  3. spectrogram
  4. disease detection
  5. deep learning

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)82
  • Downloads (Last 6 weeks)10
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic Speech and Voice Disorder Detection Using Deep Learning—A Systematic Literature ReviewIEEE Access10.1109/ACCESS.2024.337171312(49667-49681)Online publication date: 2024
  • (2024)Improving Voice Pathology Classification Using Artificial Data GenerationProcedia Computer Science10.1016/j.procs.2024.09.612246(5175-5184)Online publication date: 2024
  • (2024)Diagnosis of pathological speech with streamlined features for long short-term memory learningComputers in Biology and Medicine10.1016/j.compbiomed.2024.107976170(107976)Online publication date: Mar-2024
  • (2024)A depthwise separable CNN-based interpretable feature extraction network for automatic pathological voice detectionBiomedical Signal Processing and Control10.1016/j.bspc.2023.10562488(105624)Online publication date: Feb-2024
  • (2023)Pareto-Optimized AVQI Assessment of Dysphonia: A Clinical Trial Using Various SmartphonesApplied Sciences10.3390/app1309536313:9(5363)Online publication date: 25-Apr-2023
  • (2023)Applications of edge analytics: a systematic reviewActa Universitatis Sapientiae, Informatica10.2478/ausi-2023-002115:2(345-358)Online publication date: 12-Dec-2023
  • (2023)Automatic Voice Disorder Detection Using Self-Supervised RepresentationsIEEE Access10.1109/ACCESS.2023.324398611(14915-14927)Online publication date: 2023
  • (2023)A federated learning-based approach to recognize subjects at a high risk of hypertension in a non-stationary scenarioInformation Sciences: an International Journal10.1016/j.ins.2022.11.126622:C(16-33)Online publication date: 1-Apr-2023
  • (2023)Toward a lightweight ASR solution for atypical speech on the edgeFuture Generation Computer Systems10.1016/j.future.2023.08.002149(455-463)Online publication date: Dec-2023
  • (2022)Speech-based Evaluation of Emotions-Depression Correlation2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927758(1-6)Online publication date: 12-Sep-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media