A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation

https://doi.org/10.1016/j.bspc.2021.102604Get rights and content

Highlights

  • This article applied the transfer entropy method to analyze the information flow of Brazilian Market Index (Ibovespa) and its constituents.

  • An asymmetric behavior was found about the information transfer between the Ibovespa and its constituents.

  • The analysis procedure employed shown that the companies have more impact on the Ibovespa than the Ibovespa impacts on the companies.

  • The transfer entropy analysis was done for three periods: before the 2008 crisis, in the 2008 crisis, and after 2008 crisis.

  • This paper proposes a methodology that can help in the detection of Parkinson’s disease (PD) from voice recordings.

  • The proposed features will be represented by one vector representing their statistical distribution by using their probability density functions.

  • The features are extracted from 42 samples of sustained vowel emissions of /a/, from both healthy and PD voices subjects to fulfill this purpose.

  • A new preprocessing technique is then conducted. The approach uses pertinent matrices built for each subject.

  • The decision phase is realized by applying three types of Machine Learning (ML) classifiers: a K-Nearest Neighbor algorithm (K-NN), a Support Vector Machine (SVM), and a Random Forest (RF) classifier.

Abstract

This paper proposes a methodology that can help in the detection of Parkinson’s disease (PD) from voice recordings. It is based on eight of voice features, describing vocal folds behavior such as frequency and amplitude perturbations, biomechanical instability and neurological tremor, where, each of the proposed features will be represented by one vector representing their statistical distribution by using their probability density functions. The features are extracted from 42 samples of sustained vowel emissions of /a/, from both healthy and PD voices subjects to fulfill this purpose. A new preprocessing technique is then conducted. The approach uses pertinent matrices built for each subject. The matrices are composed of vectors arranged by segment, feature and number of phonation cycles. An estimation of the maxima maximorum (MM) and minima minimorum (mm) values is used to normalize the data. Then, each of the normalized vectors is submitted to an outlier removal process. The performance of the effective predicted attributes has been tested using rank feature selection. Then, the decision phase is realized by applying three types of Machine Learning (ML) classifiers: a K-Nearest Neighbor algorithm (K-NN), a Support Vector Machine (SVM), and a Random Forest (RF) classifier. Even though the three types of used ML classifiers give high rate decisions, the experimental results showed that the RF classifier can improve the efficiency of the preprocessing approach achieving a recognition rate of 99 % for females and 98 % for males, in detecting PD dysphonia. The results presented here outperform those published in the literature.

Introduction

Neurodegenerative diseases are nowadays among the major threats of third-age health, due to the substantial increase of people affected [1]. One of the most prevalent neurological disorders spreading among an average from seven to ten million people worldwide, increasing at a rate of 15.7 cases out of 100,000 people, is Parkinson’s Disease (PD) [2,3]. PD is a progressive neurodegenerative brain disorder caused by a loss of the dopaminergic neurons found in substantia nigra. This reduction in dopamine leads to the appearance of specific symptoms such as shaking, movement rigidity, slow movement, and other non-motor symptoms [1,4]. In 90 % of patients with Parkinson (PWP), the most frequent sign of PD is hypokinetic dysarthria, producing alterations in phonation, articulation and prosody [4,5].

Historically, many studies investigating the factors associated to PD focused on voice processing exploit in Machine Learning (ML) techniques as they are considered useful tools to differentiate PD patients from healthy individuals (HI) [[6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]]. Tsanas et al. selected in [6], features based on Mel Frequency Cepstrum Coefficients (MFCCs) and Harmonic to Noise Ratio (HNR) as relevant features to detect dysarthria. However, and in spite of the good results produced by these classical features, a new non-standard feature named Pitch Period Entropy (PPE) has produced a significant performance increase in detection rates [7]. Perturbation and HNR-based estimations are considered relevant features. These later are extracted by combining linear SVM classification with three feature selections (FS) approaches: Feature ranking, wrapping methods and embedded techniques, attained a classification accuracy of 97 % by using "rotation forest ensemble" and K-NN classifiers [8]. Moreover, the authors found that sustained vowels are more suitable than words or sentences to discriminate PD from normative voice. Sakar et al., promote also the use of means and standard deviations of features [9]. Based on classical and innovative features, Hariharan et al. [10], achieved a good performance using a hybrid intelligent system which includes feature preprocessing using model-based clustering with Gaussian Mixture Models (GMM), feature reduction using Principal Component Analysis (PCA), and feature selection by Linear Discriminate Analysis (LDA). HNR-type features, wavelet-based estimates and MFCCs clustering are selected by using Fit-locally and Think-globally (LOGO) feature selection to produce a response such as ‘0’ for “high” or ‘1’ for “low” phonation quality from sustained utterances of /a/ [11]. In our previous work [12], a multinomial Naïve Bayes classifier attained an accuracy of 95 % in detecting pathologic voice belonging to PWP using traditional features extracted from a database created by Sakar et al. [9]. The obtained accuracy improved earlier results in a 3% by using PCA for the feature dimensionality reduction and the Multi-Dimensional Voice Processing Program (MDVP) [13]. Besides, using a compensation/normalization approach on MFCCs features extracted from sustained vowels, /a/, /o/ and /u/ from the same database, an accuracy of 97 % was reached by a K-NN classifier [14]. The results reported in [15], show that the SVM classifier achieved the highest classification accuracy (92.21 %) with the first fourteen phonation patterns identified by the Mann-Whitney-Wilcoxon (MWW) feature ranking technique from the 22 features identified in [6]. Random Forest (RF) applied on 20 features selected by Minimum Redundancy Maximum Relevance (MRMR), produced an improved performance on pathology detection from voice of PD patients. The selected features are converted to complex values to be used as inputs to Complex-Valued Artificial Neural Networks (CVANN) [16]. The database created by Sakar et al. [9], was split into subsets containing 40 samples from each phonation. In this study, 10 % improvement on the scores reported in [9] was attained using an Adjusted Multiple-Classifier with Feature Selection (A-MCFS) combined with a K-NN classifier [17].

Gomez Vilda et al. [18], proposed a methodology based on articulation features such as the Vowel Space Area (VSA) and Formant Centralization Ratio (FCR). This study proved the efficiency of these features using gender separation where the FCR produced a good performance in differentiating dysarthric from healthy speech related to treatment effects. Conversely, the same authors concluded in [19] that the static features mentioned before may not describe well the dynamic behavior of neuromotor articulation. For this purpose, a new kinematic descriptor defined as the probability distribution of the absolute kinematic velocity of the jaw-tongue system was proposed to better describe this behavior. The same authors proved in [20] the efficiency of vowel articulation kinematic distributions derived from the first two formants comparing them with MFCCs, to detect dysarthria in PD voices.

The authors proved in terms of statistical analysis, that articulation and phonation features are able to differentiate between HI, PD and Multiple-Sclerosis (MS) affected participants [21]. An eleven different training and test splitting datasets were used in [22]. The data obtained from this study were compared with evaluation metrics values. The best results were obtained using a splitting of 75 % and 25 % samples for train and test respectively with an accuracy of 85 % using an RF classifier.

Braga et al. [23], used a database composed of 22 speakers with PD containing 1002 speech lines, and a second database with 30 HI containing 785 speech lines to train their model. A database with 18 PWP containing sustained vowels was used as separate data to validate the model proposed. The authors used features extracted following [6]. The best accuracy of 99.8 % using an RF classifier was obtained using a Leave-One-Out-Subject (LOSO) strategy.

LOSO is also used in the study done by [24] to discriminate between three groups of patients including PD, Multiple System Atrophy (MSA) and other Neurological Diseases (ND) such as: functional neurological disorder, somatization, dystonia, cervical dystonia, essential tremors and generalized paroxysmal dystonia. MFCCs, PLP and Rasta PLP were used as feature extraction methods. The authors claimed accuracies between 92 % and 100 %.

Most of the reviewed classification studies were based on normal distribution parametric representations consisting in means and standard deviations of feature, estimated on signal frames from each subject. At this point, it must be taken into account that accurately representing the distribution of a given feature by their mean and standard deviation should only be feasible on normal feature distributions. Otherwise, mean and standard deviations could not be considered as enough robust estimates representing the statistical properties of the features under consideration. Meanwhile, we know beforehand that most of the voice-derived features do not support the strong requirement of normal distributions. To overcome this problem and to preserve the statistical information present in each voice feature, a novel study is proposed in this paper, based on probability density functions of the sets of perturbation, biomechanical and neurological phonation features in PD voice analysis. The main idea consists on the use of voice feature distributions instead of normal distribution parameters from each subject as an independent data sample, and to make use of non-parametric tests and estimation methodologies. In addition, the use of feature distributions may help dealing with the problem of data scarcity, to successfully apply ML classification methods requiring a large amount of data in training detection or classification models. Consequently, in this study, each of the estimated features will be represented by one vector describing its statistical distribution rather than means and/or standard deviations. Indeed, in the learning stage, features were used as inputs to be classified as “healthy” or “pathological” signals by exploiting three types of ML classifiers: a K-Nearest Neighbor algorithm (K-NN), a Support Vector Machine (SVM), and a Random Forest (RF) classifier. Then, a comparison between the performances of these classifiers is done by evaluation metrics values. Finally, in order to test the efficiency of our approach, its performance was contrasted with the same three classifiers using parametric estimations as the mean and standard deviation of the proposed features. A comparative analysis is done between the two approaches, showing the substantial better performance of distribution data with respect to the parametric approach.

The rest of the paper is organized as follows. Section 2 is devoted to describe feature extraction. The proposed methodology is explained in section 3. Results are presented and discussed in section 4. Finally, in section 5 conclusions are summarized.

Section snippets

Feature extraction

When air flows through the larynx, it induces vocal fold vibration, which is considered as the basis for high quality voice production. During phonation, a mucosal wave is created by the opening and the closing of the vocal folds. Fig. 1 shows the four principal phases during one vocal fold cycle [25]. As it can be seen, the cycle initiates at the contact phase coinciding with the maximum closure of the glottis (1), then, in (2−4) the air coming from the lungs forces the subglottal rim of the

Methodology

The proposed methodology is presented in Fig. 3. It is composed of six main steps. In the first and second steps, data from several voice recordings are collected under gender separation basis for the sustained vowel /a/. In the third step, voice features are extracted from each set recordings. The proposed pre-processed approach is presented in step four and the relevance of the proposed features is evaluated in the fifth step. Finally, to test the performance of our method, a ML

Dataset

The dataset consisted in 42 sustained utterances of vowel /a/ produced by 16 healthy individuals (8 female and 8 male) covering ages from 45 to 83 (mean 62.53, standard deviation 10.79), and 26 PD patients (12 male and 14 female) covering ages from 39 to 79 (mean 63.76, standard deviation 9.69). The patients suffered from PD from0 to 13 years since first diagnosis, with general UPDRS score varying between 3–55 (mean = 25 and standard deviation = 15.10).

The utterances were extracted from the

Conclusions

In this paper a method to classify subjects as pathological or healthy, based on their phonation using relevance analysis on perturbation, biomechanical and neurological features in Parkinson's disease is presented. This approach is based on data selection by relevance taking into consideration the whole feature distributions rather than means and standard deviations from supposedly of normal distributions. Graphical histograms showed that the estimated features do not support such a

Funding

No funding was received for this work.

Intellectual property

We confirm that we have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property

Research ethics

We further confirm that any aspect of the work covered in this manuscript that has involved human patients has been conducted with the ethical approval of all relevant bodies and that such approvals are acknowledged within the manuscript.

IRB approval was obtained (required for studies and series of 3 or more cases)

Written consent to publish potentially identifying information, such as details or the case and photographs, was obtained from the patient(s) or their legal guardian(s).

Authorship

All listed authors meet the ICMJE criteria. 
We attest that all authors contributed significantly to the creation of this manuscript, each having fulfilled criteria as established by the ICMJE.

We confirm that the manuscript has been read and approved by all named authors.

We confirm that the order of authors listed in the manuscript has been approved by all named authors

Contact with the editorial office

This author submitted this manuscript using his/her account in editorial submission system.

We understand that this Corresponding Author is the sole contact for the Editorial process (including the editorial submission system and direct communications with the office). He/she is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs.

We confirm that the email address shown below is accessible by the Corresponding Author, is the

Declaration of Competing Interest

The authors report no declarations of interest.

References (42)

  • J.P. Taylor et al.

    Toxic proteins in neurodegenerative disease

    Science

    (2002)
  • Y. Yunusova et al.

    Articulatory movements during vowels in speakers with dysarthria and healthy controls’

    J. Speech Lang. Hear. Res.

    (2008)
  • L. Brabenec et al.

    Speech disorders in Parkinson’s disease: early diagnostics and effects of medication and brain stimulation’

    J. Neural Transm.

    (2017)
  • A. Tsanas et al.

    Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease’

    IEEE Trans. Biomed. Eng.

    (2012)
  • M.A. Little et al.

    Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease’

    IEEE Trans. Biomed. Eng.

    (2009)
  • A. Ozcift

    SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease

    J. Med. Syst.

    (2012)
  • B.E. Sakar et al.

    Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings

    IEEE J. Biomed. Health Inform.

    (2013)
  • M. Hariharan et al.

    A new hybrid intelligent system for accurate detection of Parkinson’s disease

    Comput. Methods Programs Biomed.

    (2014)
  • D. Meghraoui et al.

    Parkinson’s Disease Recognition by Speech Acoustic Parameters Classification’: ‘Modelling and Implementation of Complex Systems

    (2016)
  • D. Meghraoui et al.

    Features dimensionality reduction and multi-dimensional voice processing program to parkinson disease discrimination’

  • D. Meghraoui et al.

    Healthy and parkinson voices discrimination based on compensation/normalization cepstral features

  • Cited by (16)

    • Computerized analysis of speech and voice for Parkinson's disease: A systematic review

      2022, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Karabayir et al. [104], in a well-described experiment, considered 264 acoustic features. Meghraoui et al. [105] used 8 vocal fold behavior features, including frequency and amplitude perturbations, biomechanical instability and neurological tremor. Sakar et al. [101] represented the samples of each subject with the central tendency and dispersion metrics of dysphonia features, which improved the generalization of the predictive model.

    View all citing articles on Scopus
    View full text