Prediction of the Unified Parkinson’s Disease Rating Scale assessment using a genetic programming system with geometric semantic genetic operators

https://doi.org/10.1016/j.eswa.2014.01.018Get rights and content

Highlights

  • Assessment of Parkinson’s disease (PD) symptom progression using a CI system.

  • System that includes the concept of semantics in the search process.

  • Results achieved using the largest database of PD speech in existence.

  • Better results than the ones produced by standard GP and other ML methods.

  • Results outperform the best published results achieved using the same dataset.

Abstract

Unified Parkinson’s Disease Rating Scale (UPDRS) assessment is the most used scale for tracking Parkinson’s disease symptom progression. Nowadays, the tracking process requires a patient to undergo invasive and time-consuming specialized examinations in hospital clinics, under the supervision of trained medical staff. Thus, the process is costly and logistically inconvenient for both patients and clinicians. For this reason, new powerful computational tools, aimed at making the process more automatic, cheaper and less invasive, are becoming more and more a necessity. The purpose of this paper is to investigate the use of an innovative intelligent system based on genetic programming for the prediction of UPDRS assessment, using only data derived from simple, self-administered and non-invasive speech tests. The system we propose is called geometric semantic genetic programming and it is based on recently defined geometric semantic genetic operators. Experimental results, achieved using the largest database of Parkinson’s disease speech in existence (approximately 6000 recordings from 42 Parkinson’s disease patients, recruited in a six-month, multi-centre trial), show the appropriateness of the proposed system for the prediction of UPDRS assessment. In particular, the results obtained with geometric semantic genetic programming are significantly better than the ones produced by standard genetic programming and other state of the art machine learning methods both on training and unseen test data.

Introduction

Neurological disorders, including Parkinson’s disease (PD), affect profoundly the lives of patients and their families (Caap-Ahlgren & Dehlin, 2002). PD is a disorder of the central nervous system that leads to severe difficulties with body motions. It is the second most common neurodegenerative disorder after Alzheimer’s disease (de Rijk et al., 2000) and it is estimated that more than one million people in North America alone are affected by it (Lang & Lozano, 1998). Moreover, as explained in Little, McSharry, Hunter, Spielman, and Ramig (2009), because of the rapid increase in the average population age in several countries, and since the risk of contracting PD increases after the age of 60 (Van Den Eeden et al., 2013), this number is expected to rise in the next few years. As a direct consequence, the medical care costs for patients with PD is estimated to rise in the future (Huse et al., 2005). The currently available therapies aim at improving the functional capacity of the patient for as much time as possible; however they are not able to modify the progression of the neurodegenerative process (Singh, Pillay, & Choonara, 2007). Most people affected by PD will therefore be substantially dependent on clinical intervention.

The process of tracking PD symptoms progression is a complex task. It often uses a system of measurement of the intensity of the symptoms called Unified Parkinson’s Disease Rating Scale (UPDRS). The UPDRS is a scale that was developed as an effort to incorporate elements from existing scales to provide a comprehensive, efficient and flexible way of measuring and monitoring PD-related disability and impairment (Movement Disorder Society, 2003). Prior to its development, multiple scales were used in different hospital clinics and health centers, making comparative assessments difficult. One of the core advantages of the UPDRS is that it was developed as a compound scale to capture multiple aspects of PD. It assesses both motor disability and motor impairment. Of all analogous available clinical scales, the UPDRS is currently the most commonly used one (Ramaker, Marinus, Stiggelbout, & Van Hilten, 2002). It reflects the presence and severity of symptoms, expressing it in a range from 0 to 176, with 0 representing a healthy state and 176 total disability. The UPDRS contains three sections:

  • Mentation, Behavior and Mood.

  • Activities of daily living.

  • Motor.

The motor section of the UPDRS encompasses tasks such as speech, facial expression, tremor and rigidity and expresses the severity of the related symptoms in a range from 0 to 108, where 0 represents a symptom free state and 108 denotes severe motor impairment.

For many persons affected by PD, the necessary specialized medical examinations to estimate the severity of their symptoms are difficult and invasive and they have to be performed by trained medical staff. Thus, as described in Tsanas, Little, McSharry, and Ramig (2010), symptom monitoring is costly and logistically inconvenient for patients and clinicians. All these critical aspects highlight the need of reliable and accurate computational techniques that allow estimating the UPDRS automatically and effectively.

In this paper, we present a comparative study of a set of computational methods aimed at predicting the severity of the PD symptoms in their entirety (i.e. including all of the three sections of the UPDRS) and the severity of the symptoms considered in the motor section of the UPDRS. The studied methods attempt to express these quantities as a function of several other features related to patients. Thus, the application is reduced to two symbolic regression problems, using as many datasets. The two datasets contain identical features and differ between each other in terms of the target values to be predicted. The dataset using as target the values of the severity of the general PD symptoms (including all of the three sections of the UPDRS) will be called total-UPDRS from now on, while the one using as target the values of the severity of the motor symptoms will be called motor-UPDRS.

In particular, the focus of this paper is on an intelligent system based on genetic programming (Koza, 1992, Poli et al., 2008). We use a recently introduced version of genetic programming, that uses so called geometric semantic genetic operators. We compare the results obtained with this new version of genetic programming to the ones returned by standard genetic programming and a set of different state-of-the-art machine learning methods.

The paper is organized as follows: Section 2 introduces standard genetic programming. Section 3 presents and motivates geometric semantic genetic operators. Section 4 describes the data we used and our experimental settings and proposes an accurate analysis of the results, comparing them with several different machine learning techniques. Finally, Section 5 concludes the paper.

Section snippets

Genetic programming

Models lie in the core of any technology in any industry, be it finance, health, manufacturing, services, mining, or information technology. The task of data-driven modeling lies in using a limited number of observations of system variables for inferring relationships among these variables. The design of reliable learning machines for data-driven modeling tasks is of strategic importance, as there are many systems that cannot be accurately modeled by classical mathematical or statistical

Geometric semantic operators

In the last few years, GP has been extensively used both in Industry and Academia (Arcuri and Yao, 2010, Chan et al., 2010, Choi and Choi, 2012, dos Santos et al., 2011, Koza et al., 2008, Moreno-Torres et al., 2013, Ravisankar et al., 2010, Trujillo et al., 2012, Yeun et al., 2000, Wongseree et al., 2007) and it has produced a wide set of results that have been defined human-competitive (Koza, 2010). While these results have demonstrated the appropriateness of GP in tackling real-life

Data set

This study makes use of the recordings described in Goetz et al. (2009) and in Tsanas et al. (2010), where 52 subjects with idiopathic PD were recruited. A subject was diagnosed with PD if he had at least two of the following: rest tremor, bradykinesia (slow movement) or rigidity, without evidence of other forms of parkinsonism. The study was supervised by six US medical centers: Georgia Institute of Technology (7 subjects), National Institutes of Health (10 subjects), Oregon Health and Science

Conclusions

The process of tracking Parkinson’s disease (PD) symptoms progression is very complex and new and powerful computational methods are needed to automatize it and make it faster and more reliable. The objective of this paper was to present a computational intelligence method that could outperform the state-of-the-art ones in terms of prediction accuracy of the PD symptoms progression, automatically discovering insightful relationships between dysphonia measures and the well known Unified

Acknowledgments

This work was supported by national funds through FCT under contract PEst-OE/EEI/LA0021/2013 and by projects MassGP (PTDC/EEI-CTP/2975/2012), EnviGP (PTDC/EIA-CCO/103363/2008) and InteleGen (PTDC/DTP-FTO/1747/2012), Portugal.

References (41)

  • Y. Yeun et al.

    Function approximations by superimposing genetic programming trees: With applications to engineering problems

    Information Sciences

    (2000)
  • A. Arcuri et al.

    Co-evolutionary automatic programming for software development

    Information Sciences

    (2010)
  • L. Beadle et al.

    Semantically driven mutation in genetic programming

  • P. Boersma

    Praat, a system for doing phonetics by computer

    Glot International

    (2001)
  • M. Caap-Ahlgren et al.

    Factors of importance to the caregiver burden experienced by family caregivers of parkinson’s disease patients

    Aging Clinical and Experimental Research

    (2002)
  • M.C. de Rijk et al.

    Prevalence of Parkinson’s disease in Europe: A collaborative study of population-based cohorts. Neurologic diseases in the elderly research group

    Neurology

    (2000)
  • C.G. Goetz et al.

    Testing objective measures of motor impairment in early Parkinson’s disease: Feasibility study of an at-home testing device

    Movement Disorders

    (2009)
  • S. Haykin

    Neural networks: A comprehensive foundation

    (1999)
  • Hoffmann, L. (2009). Multivariate isotonic regression and its algorithms, Wichita State University, College of Liberal...
  • D.M. Huse et al.

    Burden of illness in Parkinson’s disease

    Movement Disorders

    (2005)
  • Cited by (55)

    • Semantic schema based genetic programming for symbolic regression

      2022, Applied Soft Computing
      Citation Excerpt :

      In this method, after the evolution, the best individual of the population is generated by applying some changes to its predecessors from the first generation to the last one. This kind of genetic programming was later tested on several real-world problems like [59,60]. In addition, a geometric mutation was designed and introduced in the semantic space [61], which produced offspring smaller than its parents or equal in size.

    • A novel binary classification approach based on geometric semantic genetic programming

      2022, Swarm and Evolutionary Computation
      Citation Excerpt :

      Also in this case, the problems faced required the prediction of a real-valued index. As regards medical applications, GSGP has also been applied for the prediction of the unified Parkinsons disease rating scale (UPDRS) index [26], a score that provides an efficient and flexible way of measuring and monitoring PD-related disability and impairment. Here the UPDRS score was predicted using 18 features.

    • MedGA: A novel evolutionary method for image enhancement in medical imaging systems

      2019, Expert Systems with Applications
      Citation Excerpt :

      This achievement plays a fundamental role for threshold-based segmentation approaches, since they strongly rely on the assumption that the bimodal histogram under investigation is composed of two nearly Gaussian distributions with almost equal size and variance (Xue & Zhang, 2012). MedGA also differs from GP-based approaches whose generated solutions might have a large size (Castelli et al., 2014), even when the GP model is implemented efficiently, thus representing a limitation that could significantly impair the readability and interpretability of the final outcome. Moreover, MedGA does not require any user interaction step, differently to Poli and Cagnoni (1997) where the user, being directly involved in the tournament selection, controls the evolution of simple programs that enhance and integrate multiple gray-scale images into a single pseudo-color image.

    View all citing articles on Scopus
    View full text