A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation
Introduction
Neurodegenerative diseases are nowadays among the major threats of third-age health, due to the substantial increase of people affected [1]. One of the most prevalent neurological disorders spreading among an average from seven to ten million people worldwide, increasing at a rate of 15.7 cases out of 100,000 people, is Parkinson’s Disease (PD) [2,3]. PD is a progressive neurodegenerative brain disorder caused by a loss of the dopaminergic neurons found in substantia nigra. This reduction in dopamine leads to the appearance of specific symptoms such as shaking, movement rigidity, slow movement, and other non-motor symptoms [1,4]. In 90 % of patients with Parkinson (PWP), the most frequent sign of PD is hypokinetic dysarthria, producing alterations in phonation, articulation and prosody [4,5].
Historically, many studies investigating the factors associated to PD focused on voice processing exploit in Machine Learning (ML) techniques as they are considered useful tools to differentiate PD patients from healthy individuals (HI) [[6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]]. Tsanas et al. selected in [6], features based on Mel Frequency Cepstrum Coefficients (MFCCs) and Harmonic to Noise Ratio (HNR) as relevant features to detect dysarthria. However, and in spite of the good results produced by these classical features, a new non-standard feature named Pitch Period Entropy (PPE) has produced a significant performance increase in detection rates [7]. Perturbation and HNR-based estimations are considered relevant features. These later are extracted by combining linear SVM classification with three feature selections (FS) approaches: Feature ranking, wrapping methods and embedded techniques, attained a classification accuracy of 97 % by using "rotation forest ensemble" and K-NN classifiers [8]. Moreover, the authors found that sustained vowels are more suitable than words or sentences to discriminate PD from normative voice. Sakar et al., promote also the use of means and standard deviations of features [9]. Based on classical and innovative features, Hariharan et al. [10], achieved a good performance using a hybrid intelligent system which includes feature preprocessing using model-based clustering with Gaussian Mixture Models (GMM), feature reduction using Principal Component Analysis (PCA), and feature selection by Linear Discriminate Analysis (LDA). HNR-type features, wavelet-based estimates and MFCCs clustering are selected by using Fit-locally and Think-globally (LOGO) feature selection to produce a response such as ‘0’ for “high” or ‘1’ for “low” phonation quality from sustained utterances of /a/ [11]. In our previous work [12], a multinomial Naïve Bayes classifier attained an accuracy of 95 % in detecting pathologic voice belonging to PWP using traditional features extracted from a database created by Sakar et al. [9]. The obtained accuracy improved earlier results in a 3% by using PCA for the feature dimensionality reduction and the Multi-Dimensional Voice Processing Program (MDVP) [13]. Besides, using a compensation/normalization approach on MFCCs features extracted from sustained vowels, /a/, /o/ and /u/ from the same database, an accuracy of 97 % was reached by a K-NN classifier [14]. The results reported in [15], show that the SVM classifier achieved the highest classification accuracy (92.21 %) with the first fourteen phonation patterns identified by the Mann-Whitney-Wilcoxon (MWW) feature ranking technique from the 22 features identified in [6]. Random Forest (RF) applied on 20 features selected by Minimum Redundancy Maximum Relevance (MRMR), produced an improved performance on pathology detection from voice of PD patients. The selected features are converted to complex values to be used as inputs to Complex-Valued Artificial Neural Networks (CVANN) [16]. The database created by Sakar et al. [9], was split into subsets containing 40 samples from each phonation. In this study, 10 % improvement on the scores reported in [9] was attained using an Adjusted Multiple-Classifier with Feature Selection (A-MCFS) combined with a K-NN classifier [17].
Gomez Vilda et al. [18], proposed a methodology based on articulation features such as the Vowel Space Area (VSA) and Formant Centralization Ratio (FCR). This study proved the efficiency of these features using gender separation where the FCR produced a good performance in differentiating dysarthric from healthy speech related to treatment effects. Conversely, the same authors concluded in [19] that the static features mentioned before may not describe well the dynamic behavior of neuromotor articulation. For this purpose, a new kinematic descriptor defined as the probability distribution of the absolute kinematic velocity of the jaw-tongue system was proposed to better describe this behavior. The same authors proved in [20] the efficiency of vowel articulation kinematic distributions derived from the first two formants comparing them with MFCCs, to detect dysarthria in PD voices.
The authors proved in terms of statistical analysis, that articulation and phonation features are able to differentiate between HI, PD and Multiple-Sclerosis (MS) affected participants [21]. An eleven different training and test splitting datasets were used in [22]. The data obtained from this study were compared with evaluation metrics values. The best results were obtained using a splitting of 75 % and 25 % samples for train and test respectively with an accuracy of 85 % using an RF classifier.
Braga et al. [23], used a database composed of 22 speakers with PD containing 1002 speech lines, and a second database with 30 HI containing 785 speech lines to train their model. A database with 18 PWP containing sustained vowels was used as separate data to validate the model proposed. The authors used features extracted following [6]. The best accuracy of 99.8 % using an RF classifier was obtained using a Leave-One-Out-Subject (LOSO) strategy.
LOSO is also used in the study done by [24] to discriminate between three groups of patients including PD, Multiple System Atrophy (MSA) and other Neurological Diseases (ND) such as: functional neurological disorder, somatization, dystonia, cervical dystonia, essential tremors and generalized paroxysmal dystonia. MFCCs, PLP and Rasta PLP were used as feature extraction methods. The authors claimed accuracies between 92 % and 100 %.
Most of the reviewed classification studies were based on normal distribution parametric representations consisting in means and standard deviations of feature, estimated on signal frames from each subject. At this point, it must be taken into account that accurately representing the distribution of a given feature by their mean and standard deviation should only be feasible on normal feature distributions. Otherwise, mean and standard deviations could not be considered as enough robust estimates representing the statistical properties of the features under consideration. Meanwhile, we know beforehand that most of the voice-derived features do not support the strong requirement of normal distributions. To overcome this problem and to preserve the statistical information present in each voice feature, a novel study is proposed in this paper, based on probability density functions of the sets of perturbation, biomechanical and neurological phonation features in PD voice analysis. The main idea consists on the use of voice feature distributions instead of normal distribution parameters from each subject as an independent data sample, and to make use of non-parametric tests and estimation methodologies. In addition, the use of feature distributions may help dealing with the problem of data scarcity, to successfully apply ML classification methods requiring a large amount of data in training detection or classification models. Consequently, in this study, each of the estimated features will be represented by one vector describing its statistical distribution rather than means and/or standard deviations. Indeed, in the learning stage, features were used as inputs to be classified as “healthy” or “pathological” signals by exploiting three types of ML classifiers: a K-Nearest Neighbor algorithm (K-NN), a Support Vector Machine (SVM), and a Random Forest (RF) classifier. Then, a comparison between the performances of these classifiers is done by evaluation metrics values. Finally, in order to test the efficiency of our approach, its performance was contrasted with the same three classifiers using parametric estimations as the mean and standard deviation of the proposed features. A comparative analysis is done between the two approaches, showing the substantial better performance of distribution data with respect to the parametric approach.
The rest of the paper is organized as follows. Section 2 is devoted to describe feature extraction. The proposed methodology is explained in section 3. Results are presented and discussed in section 4. Finally, in section 5 conclusions are summarized.
Section snippets
Feature extraction
When air flows through the larynx, it induces vocal fold vibration, which is considered as the basis for high quality voice production. During phonation, a mucosal wave is created by the opening and the closing of the vocal folds. Fig. 1 shows the four principal phases during one vocal fold cycle [25]. As it can be seen, the cycle initiates at the contact phase coinciding with the maximum closure of the glottis (1), then, in (2−4) the air coming from the lungs forces the subglottal rim of the
Methodology
The proposed methodology is presented in Fig. 3. It is composed of six main steps. In the first and second steps, data from several voice recordings are collected under gender separation basis for the sustained vowel /a/. In the third step, voice features are extracted from each set recordings. The proposed pre-processed approach is presented in step four and the relevance of the proposed features is evaluated in the fifth step. Finally, to test the performance of our method, a ML
Dataset
The dataset consisted in 42 sustained utterances of vowel /a/ produced by 16 healthy individuals (8 female and 8 male) covering ages from 45 to 83 (mean 62.53, standard deviation 10.79), and 26 PD patients (12 male and 14 female) covering ages from 39 to 79 (mean 63.76, standard deviation 9.69). The patients suffered from PD from0 to 13 years since first diagnosis, with general UPDRS score varying between 3–55 (mean = 25 and standard deviation = 15.10).
The utterances were extracted from the
Conclusions
In this paper a method to classify subjects as pathological or healthy, based on their phonation using relevance analysis on perturbation, biomechanical and neurological features in Parkinson's disease is presented. This approach is based on data selection by relevance taking into consideration the whole feature distributions rather than means and standard deviations from supposedly of normal distributions. Graphical histograms showed that the estimated features do not support such a
Funding
No funding was received for this work.
Intellectual property
We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property
Research ethics
We further confirm that any aspect of the work covered in this manuscript that has involved human patients has been conducted with the ethical approval of all relevant bodies and that such approvals are acknowledged within the manuscript.
IRB approval was obtained (required for studies and series of 3 or more cases)
Written consent to publish potentially identifying information, such as details or the case and photographs, was obtained from the patient(s) or their legal guardian(s).
Authorship
All listed authors meet the ICMJE criteria. We attest that all authors contributed significantly to the creation of this manuscript, each having fulfilled criteria as established by the ICMJE.
We confirm that the manuscript has been read and approved by all named authors.
We confirm that the order of authors listed in the manuscript has been approved by all named authors
Contact with the editorial office
This author submitted this manuscript using his/her account in editorial submission system.
We understand that this Corresponding Author is the sole contact for the Editorial process (including the editorial submission system and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs.
We confirm that the email address shown below is accessible by the Corresponding Author, is the
Declaration of Competing Interest
The authors report no declarations of interest.
References (42)
- et al.
Objective automatic assessment of rehabilitative speech treatment in Parkinson’s disease
Ieee Trans. Neural Syst. Rehabil. Eng.
(2013) - et al.
Computer-aided diagnosis of Parkinson’s disease using complex-valued neural networks and mRMR feature selection algorithm
J. Healthc. Eng.
(2015) - et al.
Evaluation of train and test performance of machine learning algorithms and Parkinson diagnosis with statistical measurements
Med. Biol. Eng. Comput.
(2020) - et al.
Detecting multiple system atrophy, Parkinson and other neurological disorders using voice analysis
Int. J. Speech Technol.
(2017) - et al.
Joint analysis of vocal jitter, flutter and tremor in vowels sustained by normophonic and parkinson speakers
Models and Analysis of Vocal Emissions for Biomedical Applications
(2019) - et al.
Synthesis of voiced sounds from a two‐mass model of the vocal cords
Bell Syst. Tech. J.
(1972) - et al.
Vocal fold stiffness estimates for emotion description in speech
Book Vocal Fold Stiffness Estimates for Emotion Description in Speech
(2013) - et al.
Acoustic discrimination of pathological voice
J. Speech Lang. Hear. Res.
(2001) - et al.
Diagnosis of vocal and voice disorders by the speech signal
New Challenges and Perspectives for the New Millennium
(2000) An essay on the shaking palsy
J. Neuropsychiatry Clin. Neurosci.
(2002)
Toxic proteins in neurodegenerative disease
Science
Articulatory movements during vowels in speakers with dysarthria and healthy controls’
J. Speech Lang. Hear. Res.
Speech disorders in Parkinson’s disease: early diagnostics and effects of medication and brain stimulation’
J. Neural Transm.
Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease’
IEEE Trans. Biomed. Eng.
Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease’
IEEE Trans. Biomed. Eng.
SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease
J. Med. Syst.
Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings
IEEE J. Biomed. Health Inform.
A new hybrid intelligent system for accurate detection of Parkinson’s disease
Comput. Methods Programs Biomed.
Parkinson’s Disease Recognition by Speech Acoustic Parameters Classification’: ‘Modelling and Implementation of Complex Systems
Features dimensionality reduction and multi-dimensional voice processing program to parkinson disease discrimination’
Healthy and parkinson voices discrimination based on compensation/normalization cepstral features
Cited by (16)
Deep transfer learning for automatic speech recognition: Towards better generalization
2023, Knowledge-Based SystemsA local dynamic feature selection fusion method for voice diagnosis of Parkinson's disease
2023, Computer Speech and LanguageComputerized analysis of speech and voice for Parkinson's disease: A systematic review
2022, Computer Methods and Programs in BiomedicineCitation Excerpt :Karabayir et al. [104], in a well-described experiment, considered 264 acoustic features. Meghraoui et al. [105] used 8 vocal fold behavior features, including frequency and amplitude perturbations, biomechanical instability and neurological tremor. Sakar et al. [101] represented the samples of each subject with the central tendency and dispersion metrics of dysphonia features, which improved the generalization of the predictive model.
Progress prediction of Parkinson's disease based on graph wavelet transform and attention weighted random forest
2022, Expert Systems with Applications