A survey of nature-inspired algorithms for feature selection to identify Parkinson's disease

https://doi.org/10.1016/j.cmpb.2016.07.029Get rights and content

Highlights

  • We perform a comparative analysis of nature inspired-algorithms for feature selection to aid the classification of affected Parkinson's patients from the rest.

  • Feature selection was applied to datasets of gait and speech of Parkinson's patients.

  • Binary Bat Algorithm outperformed traditional techniques like Particle Swarm Optimization (PSO), Genetic Algorithm and Modified Cuckoo Search Algorithm.

Abstract

Background and Objectives

Parkinson's disease is a chronic neurological disorder that directly affects human gait. It leads to slowness of movement, causes muscle rigidity and tremors. Analyzing human gait serves to be useful in studies aiming at early recognition of the disease. In this paper we perform a comparative analysis of various nature inspired algorithms to select optimal features/variables required for aiding in the classification of affected patients from the rest.

Methods

For the experiments, we use a real life dataset of 166 people containing both healthy controls and affected people. Following the optimal feature selection process, the dataset is then classified using a neural network.

Results and Conclusions

The experimental results show Binary Bat Algorithm outperformed traditional techniques like Particle Swarm Optimization (PSO), Genetic Algorithm and Modified Cuckoo Search Algorithm with a competitive recognition rate on the dataset of selected features. We compare this through different criteria like cross-validated accuracies, true positive rates, false positive rates, positive predicted values and negative predicted values.

Introduction

Parkinson's disease (PD) is a neurological disorder that progresses with time. People affected by this condition experience a shortage of dopamine, a chemical present in nerve cells of the brain that plays a vital role in coordinating the movement of human beings. The main symptoms of Parkinson's are tremor, rigidity, dysphonia and slowness of movement. People with Parkinson's also experience other issues in addition to affection of movement like tiredness, pain, depression and constipation which can have an impact on their day-to-day lives. The disease does not directly cause death but symptoms get worse as the time progresses. After a certain period at the higher spectrum of severity, the affected patients cannot walk, drink or talk anymore. The causes of PD are still unknown andtherefore the condition can be characterized as being idiopathic. While the conventional medical principles follow on the belief that PD lacks an external cause, the metabolic phenomena such as mitochondrial fatigue and oxidative stress are indeed associated with the death of dopamine-producing neurons [1]. Astiz et al. [2] in one study demonstrated that exposure to pesticides may substantially increase the chances of getting affected with PD. There is currently no cure for Parkinson's and researchers involved do not completely know of definitive causes of this condition. Diagnosis and detection of severity has therefore become a very important issue with PD.

Most of the existing methods used for evaluating Parkinson's disease depend largely on human expertise. We describe methods based on studying speech and gait signals of subjects.

  • 1)

    Speech based methods: The use of Unified Parkinson's Disease Rating Scale (UPDRS) [3] is heavily popular for quantifying the stages of severity of Parkinson's disease. It consists of a series of tests performed by the patient and rated by the clinician according to some predefined guidelines. One popular instance of using these scores in machine learning involves learning models using speech signals of subjects in order to be able to predict the UPDRS score of the subject. One such popular dataset was generated at University of Oxford in collaboration with the National Centre for Voice and Speech, Denver, Colorado and published this dataset on the UCI-Machine Learning repository (UCI-ML).

  • 2)

    Gait based methods: In another direction of work, Howard Lee et al. [4] analyzed different digital image/video processing methods to detect neurological disorders through the assessment of gait. They primarily studied the joint angles and swing distances of subjects that are observed from selected frames in the videos. The major constraint in this work was the need for a special clothing/gear designed for gait analysis of patients so that the images may be properly segmented. This system was able to identify 94% of patients correctly with one false negative. To clarify, this means that a patient was identified as non-Parkinson's, though he was suffering from it while 94% of affected Parkinson's patients were identified correctly. Nixon et al. [5] proposed temporal templates to recognize gait patterns through the silhouette method. In 2013, Klucken et al. [6] intended to equitably and immediately group particular stages and signs in PD utilizing a portable, biosensor based methodology titled as Embedded Gait Analysis utilizing Intelligent Technology (eGaIT). eGaIT comprises accelerometers and gyroscopes attached to shoes that record locomotive signs of stride and leg capacity. Apart from these methods, quantitative measures based on historic data have also been used in the past for Parkinson's disease detection and recognition. In 2009, Cho et al. [7] proposed a framework which uses a combination of multivariate statistical analysis methods like Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The results produced by the combination of PCA and LDA were much better than those produced by just using PCA. On a similar note, Jian et al. [8] then proposed an improved gait recognition method in terms of performance based on KFDA (Kernel-based Fisher Discriminant Analysis) and SVM (Support Vector Machines). In their work, the measurements of width and angle were extracted from human motion image sequences.

Though the use of sensors to detect Parkinson's disease have revolutionized its detection; the major problem that arises is with regards to the effort involved in the data acquisition of specific physical attributes (features). As an example, the Physionet Gait Database for Parkinson's patients consisting of 19 attributes was compiled by means of force platforms which measure ground reaction forces and moments, including variables like direction, magnitude and location (known as the center of pressure). In this data-collection process, each foot had 7 sensors underneath to measure force as a function of time. The sampling rate used was 100 Hz. Each subject here was analyzed for 120 s and more than 12,000 instances were produced. Other gait databases for Parkinson's detection as well contain data collected in a similar form as well.

Feature selection, where a subset of optimal features is selected, helps to reduce the cost of data acquisition in the future via an understanding of minimally required features that can achieve competitive classification accuracies in conjunction with the right models. Various methods have been proposed in the past for appropriate selection of feature subsets for classification. The size of the search space or in other words the number of possible subsets to verify grows exponentially as the number of original features in the dataset increases. Applying a brute-force approach and searching exhaustively for the optimal subset is impractical in most situations. Global heuristic search techniques have attracted the attention of researchers and are widely used in many areas.

Bio-inspired algorithms such as Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Genetic Algorithm are becoming popular choices for their applications in engineering optimization techniques. These techniques are also referred to as nature inspired algorithms or evolutionary optimization algorithms in some instances in the literature. Yun et al. [9] proposed novel methods to find optimal feature subsets by using biologically-inspired algorithms. Imani et al. [10] used Ant Colony Optimization (ACO) and Genetic Algorithm for feature selection. The number of features selected became almost half of the size of the original feature set in this example. This nature inspired algorithm converges to the optimal solution more speedily than other competing algorithms applied for feature selection. Yang et al. [11] in 2013 proposed a novel approach for feature selection using a technique called Binary Cuckoo Search (BCS) based on the breeding behavior of cuckoo birds. The experiments were carried out on two datasets and BCS outperformed the traditional Genetic Algorithm.

The rest of the paper is organized as follows. Section II describes a recent bio-inspired optimization technique called the general Bat Algorithm and its implementation. Section III describes the usage of the Binary Cuckoo Search Algorithm for feature selection. Section IV then gives a detailed approach for selecting an optimal feature subset by Bat and Cuckoo search algorithms. Section V gives a detailed experimental analysis of techniques for the feature selection problem on two datasets. One dataset was taken from the gait database and the other was based on a speech dataset from the University of California, Irvine, Machine Learning Repository (UCI).

Section snippets

Bat algorithm

It is a recently introduced nature inspired meta-heuristic optimization algorithm for solving engineering optimization tasks. This algorithm [12] is based on the echo-location behavior of bats. Bats use echo-location to detect prey and avoid obstacles in the dark by emitting ultrasound waves and listening to the echo produced through the waves reflecting from the surrounding objects. The following rules are used by the basic Bat Algorithm:

  • i.

    All bats use echo-location to find distance. It is

Cuckoo search

Cuckoo Search is a novel meta-heuristic technique proposed by Yang and Deb [13] to obtain quicker convergence for continuous optimization problems. It is based on a fascinating reproduction technique of cuckoo birds. These birds never build nests of their own. They rather rely on other host bird's nests for laying their eggs. Most surprisingly, cuckoo eggs mimic physical characteristics of host bird's eggs in context of spot-patterns and color. Their strategy of hiding their eggs ends up being

Methodology

When dealing with large datasets containing enormous number of features, dimensionality reduction can be used as a pre-processing step to reduce the number of features. In building a classifier, much of the work is concentrated on selecting relevant features and deciding how to encode them. In this particular problem of feature selection, we use the Bat Algorithm to find the optimal feature subset. In binary version of this algorithm, we use a binary string consisting of only 0's and 1's. For

Experimental results

This experiment of feature selection is tested over two different datasets. The Physionet gait dataset for classification of PD patients contains the gait measures of 73 subjects with healthy controls and 93 patients with idiopathic PD. Each foot had 7 sensors underneath it that measures force as a function of time. These data [19], [20], [21] were collected by three people in their respective areas of research:

  • a)

    Ga—Galit Yogev et al.

  • b)

    Ju—JuHausdor et al.

  • c)

    Si—SilviFrenkel Toledo et al.

To each

Conclusion

In this paper, we addressed the problem of diagnosing Parkinson's disease. Human gait can serve as a primary tool for an early detection as well as in the recognition of the stage of severity of PD. The Physionet gait database of Parkinson's affected patients was analyzed to identify the necessary attributes/features that directly influence the results obtained in the automatic classification of affected patients from the rest. Speech of the patients affected with PD also have a slight to major

References (22)

  • C.-W. Cho et al.

    A vision-based analysis system for gait recognition in patients with Parkinson's disease

    Expert Syst. Appl

    (2009)
  • Yang

    A binary cuckoo search algorithm for feature selection

    Stud. Comput. Intell

    (2014)
  • X.S. Yang et al.

    Cuckoo search via Lévy flights

    World Congr. Nat. Biologically Inspired Comput. IEEE Publ

    (2009)
  • S. Walton et al.

    Modified cuckoo search: a new gradient free optimization algorithm

    Chaos Solitons Fractals

    (2011)
  • I. Ferrer et al.

    Neuropathology of sporadic Parkinson disease before the appearance of Parkinsonism: preclinical Parkinson disease

    J. Neural. Transm. (Vienna)

    (2011)
  • A. Wang et al.

    Parkinson's disease risk from ambient exposure to pesticides

    Eur. J. Epidemiol

    (2011)
  • P. Martnez-Martn et al.

    Unified Parkinson's disease rating scale characteristics and structure, The Cooperative Multicentric Group

    Mov. Disord

    (1994)
  • H. Lee et al.

    Video analysis of human gait and posture to determine neurological disorders

    EURASIP J. Image Video Proc

    (2008)
  • P.S. Huang et al.

    Human gait recognition in canonical space using temporal templates

    Vis. Image Signal Process. IEE Proc

    (1999)
  • J. Klucken et al.

    Unbiased and mobile gait analysis detects motor impairment in Parkinson's disease, public library of science

    PLoS ONE

    (2013)
  • L.L. Jian Ni

    A gait recognition method based on kfda and svm

    Int. Workshop Intell. Syst. Appl

    (2009)
  • Cited by (65)

    • Nature inspired computation and ensemble neural network to build a robust model for spectral data

      2022, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
    View all citing articles on Scopus
    View full text