Keywords

1 Introduction

Parkinson’s disease (PD) is a long-term neurodegenerative disease, where the significant symptoms are tremor, rigidity, slowness of movements and difficulty walking. Currently, there are 7 million individuals affected on a global scale where the disease has a severe socioeconomic effect and reduces the quality of life. The condition has a significant financial impact on health care systems and society [16].

PD is now known to be caused by an interplay of environment and several genetic factors [11], but there is no known cure, only treatment to reduce symptoms. Treatment consists mainly of medication, surgery and physical therapy. Recent studies have also shown relief of symptoms via improved rehabilitation [1]. There is no diagnostically conclusive test available yet. The current diagnosis is clinical-questionaires and movement tests, and may be missed or misdiagnosed since the symptoms are common to other diseases/disorders. At the time of PD diagnosis, the disease has often progressed to an advanced stage with motor symptoms and neurophysiological damage.

It is of great importance to develop tools that can aid an unbiased diagnosis for PD in earlier stages of the disease. Some symptoms commonly appear before the motor-symptoms, such as depression, feeling tired and weak, reduced ability to smell, problems with blood pressure, heart rate, sleep disturbances and digestion [8].

Increasing the detection rate for early cases is very ambitious, especially if we do not resort to novel diagnosis tools. It would be easier, more accurate, and less prone to bias, to make a computerized diagnostic test a part of the regular screening processes. Using data from such a tool would allow us to model individual abnormalities more accurately, and make personalized and accurate predictions of disease status and progression, by comparing to earlier screenings. Another way to achieve this would be to have access to a proxy variable, that the patient can choose to send for analysis, such as data from a personal health monitor, movement data from a GPS tracker, mobile phone data, or data from a video game.

Our ambition is to predict the clinical ratings made by the physician of the underlying movement disorders from the motion tracking data, and to identify what part of the movement sequences are best suited for this task. This problem has several difficulties, of which the major ones are: (1) There are few observations compared to the number of variables. (2) The labels we want to predict are ordinal. (3) The classes are imbalanced.

In recent years the Kinect sensor has been widely used for retraining and physical therapy. Galna et al. presented such an application for PD patients [10]. A review of the usage of the Kinect sensor for medical purposes is presented in [13], of which most of the work is development and testing of physical therapy systems for various diseases and medical conditions. Of the studies covered in [13], three describe assessment of conditions, related to facioscapulohumeral muscular dystrophy (FSHD), stroke and balance in the elderly. The capabilities of the Kinect are limited, as reported in [9]; thus we do not expect to be able to detect or predict the presence of low amplitude tremors or movement disorders related to smaller movements.

We want to predict the score from the clinically collected movement data and identify the movement sequences related to PD. Due to the high number of variables, we propose to use a novel method, sparse ordinal regression. This method builds upon sparse discriminant analysis (SDA) [6] by adapting the data replication method to the sparse setting [4] to handle ordinal labels. We further extend the novel optimization approaches presented in [3] for sparse ordinal regression. The data replication method works on the principle of transforming an ordinal classification problem into multiple binary classification problems. These binary classification problems are solved together to find a common hyperplane that separates each pair of classes corresponding to adjacent ordinal labels. The difference between the hyperplanes corresponding to different classification boundaries are biases.

In the past years, multiple methods have appeared which can handle feature selection and classification problems of the type \(p \gg n\), most notably Sparse Discriminant Analysis (SDA) by [6] and Sparse Partial Least Squares for Classification by [5]. Other algorithms commonly used to solve such problems, where the focus is not necessarily classification, are elastic net by [18] and sparse principal component analysis by [7]. Using an \(l_1\)-norm regularizer in the model formulation ensures that variable selection is performed in the model optimization process which gives leverage for the user to interpret the non-zero parameters in the model. Incorporation of an \(l_1\)-norm regularizer is influenced by the Lasso [15], which uses the \(l_1\)-norm to relax the vector cardinality function in the best feature subset problem for linear regression.

Ordinal labels appear in a multitude of applications, e.g., surveys, medical rating scales and concerning online user reviews. We believe that the methodology can be applied to a variety of other problems in the future.

The main contributions of this paper consist of a novel game-like framework, the Motor-game, for assessing arm-movement in individuals with movement-related disorders in the arms. We further propose a novel method for performing classification from this data, sparse ordinal regression, allowing us to summarize a whole run into a single score.

2 Methods

We have developed a game-like environment, which we call the Motor-game, where we use the Microsoft Kinect sensor [17] and the associated software framework to do motion tracking of the players (See Fig. 1) [2].

Fig. 1.
figure 1

Left: Screenshot from the motorgame. The player sees his pose reflected as a stick figure and needs to make the stick figure’s hands hover over the buttons as fast as possible. Right: View from behind a player playing the motor-game.

The motor-game is designed to capture a range of motions from the hands and arms. There are three levels in the Motor-game, where here we focus on data from the first level. The first level has 22 tasks. In the first 11 tasks, a button appears on the right side of the screen, and the player needs to react, catch the button and keep the hand stable there for one second. The following 11 tasks are similar but for the left hand. For each player, the buttons appear in the same location, meaning that their hands have comparable positions between playthroughs. The distances between appearances of the buttons vary, forcing the player to perform large and smaller motions. Using the tracking software from the Kinect, we obtain 30 measurements per second of ten joints, hands, wrists, elbows, shoulders, center of shoulders and head, in the upper body. One of the main reason to make this data collection process in a game-like environment is to keep the players motivated to perform as well as they can, and to make the process more enjoyable, similar to games that have been made for physiotherapy in PD patients [10]. In [6], Clemmensen et al. presented the sparse optimal scoring problem (SOS), which is the formulation we employ to solve sparse ordinal regression. SDA is like a supervised version of Sparse Principal Component Analysis (PCA). We seek to find discriminant vectors to project the data to a lower dimensional representation, where we balance the objectives of minimizing variation within classes, maximizing variation between classes and feature selection. For PCA, where we do not have labels, we seek directions to maximize variation. New samples are then traditionally classified according to the nearest centroid after projection. We reformulate the SOS criterion presented in [6] for ordinal labels:

$$\begin{aligned} \begin{array}{rl} {\mathop {\hbox {arg min}}\limits _{{\varvec{\theta }}\in \mathbf {R}^2,\,{\varvec{\beta }}_{\text {Ord}} \in \mathbf {R}^{p+K-1}}} \Vert \varvec{Y}_{\text {Ord}} \varvec{\theta }-\varvec{X}_{\text {Ord}} \varvec{\beta }_{\text {Ord}}\Vert ^2_2 + \lambda _2\varvec{\beta }_{\text {Ord}}^T\hat{\varvec{\varOmega }}\varvec{\beta }_{\text {Ord}} + \lambda _1 \mathop {\sum }\limits _{{i=1}}^{p}|\beta _{i}| \end{array} \end{aligned}$$
$$\begin{aligned} \mathrm {s.t.}\;\frac{1}{n} \varvec{\theta }^T \varvec{Y}_{\text {Ord}}^T\varvec{Y}_{\text {Ord}} \varvec{\theta }= 1. \end{aligned}$$
(1)

When we solve the problem in Eq. 1 we seek a sparse discriminant vector \({\varvec{\beta }}_\text {Ord}\), which we can then use to project the data from feature space to a one-dimensional representation. In the ordinal case, we cast our problem as a binary classification problem, which only yields a single discriminant vector \(\varvec{\beta }_{\text {Ord}}\), simplifying the interpretation of the solution. \(\varvec{\beta }_{\text {Ord}}\) is a vector of length \(p+K-1\), (where p is the number of variables and K the number of classes). The first p parameters correspond to the original variables that we can interpret. The extra \(K-1\) parameters are the additional biases introduced by the data replication method, allowing us to classify the projected points, based on where they end up concerning the biases.

[6] show that for a given \({\varvec{\beta }}_\text {Ord}\) one can find \({\varvec{\theta }}\) in polynomial time. For a given \({\varvec{\theta }}\) the problem formulation is an elastic net problem, and the problem can be solved with the LARS-EN algorithm by [18]. We, however, approach the optimization from the point of proximal gradient (PG) methods and alternating direction method of multipliers (ADMM), using the soft thresholding operator to deal with the sparse regularizer in the same manner as [3].

A natural assumption for an ordinal classifier of K classes, is to have \(K-1\) non-intersecting classification boundaries, where boundary i separates classes 1 to i from classes \(i+1\) to K. In our case, that means finding a hyperplane and a set of biases to shift the hyperplane between classes. We extend the data replication method of [4] to the sparse setting, by adapting the optimization, such that it does not regularize these new bias parameters.

We construct a new data matrix \(\varvec{X}_{\text {Ord}}\) and labels \(\varvec{Y}_{\text {Ord}}\) according to the data replication method. We then define a new \((p+K-1)\times (p+K-1)\) regularization matrix \(\varvec{\hat{\varOmega }}\).

$$\begin{aligned} \varvec{\hat{\varOmega }} := \begin{bmatrix} \varvec{\varOmega }&0\\ 0&0 \end{bmatrix},\qquad \varvec{\beta }_{\text {Ord}}^T := \begin{bmatrix} \beta _1&\beta _2&\ldots&\beta _p&b_1&b_2&\ldots&b_{K-1} \end{bmatrix}, \end{aligned}$$
(2)

where \(\varvec{\varOmega }\) is a \(p\times p\) positive semi-definite regularization matrix for the parameters corresponding to the p original variables. The final adjustments relates to the \(l_1\)-norm in Eq. 1. In the soft-thresholding step of the ADMM and PG algorithms used to find \(\varvec{\beta }_{\text {Ord}}\), we only apply soft-thresholding to the first p elements.

The resulting \(\varvec{\beta }_{\text {Ord}}\) vector is show in Eq. 2. The first part is composed of a traditional discriminant vector, corresponding to the first p elements, and then \(K-1\) biases, denoted \(b_i\), for \(i\in \{1,2,...,K-1\}\). The proofs of convergence to stationary points, of the algorithms in [3], extend naturally to our approach.

3 Data and Experiments

We conducted a study, where we collected data from 63 individuals, of whom 33 were healthy controls and 30 PD patients. Detailed description of the cohort can be found in [2]. Each participant played the Motor-game two times; the first one is a trial run to get familiar with the game. Motion tracking data was collected during the playthroughs. A physician then evaluated the participants on various rating scales, of which we are concerned with the results from the Simpson-Angus-Scale (SAS) [14], in particular, item 4, which involves elbow rigidity. Furthermore, the PD patients were evaluated on the Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) [12]. On MDS-UPDRS we are most focused on items 3.3b rigidity of right hand, 3.4a finger tapping in the right hand and 3.5a hand movement for the right hand. We picked out these items since they were a priori thought to have the most substantial correspondence with the data from the Motor-game. Items from the rating scales reflecting motor symptoms in hands and arms were included. Exclusion was made if there were too few participants affected. See Fig. 2 for prevalence and severity of the observed motion conditions in the data. We refer to the ratings as clinical scores. A more detailed description of the dataset and the Motor-game can be found in [2].

Fig. 2.
figure 2

Prevalence of labels in the dataset for the conditions we focus on. The first three plots from the left correspond to MDS-UPDRS items 3.3b rigidity right arm, 3.4a finger tapping right hand and 3.5a hand movement right. The final item corresponds to elbow rigidity on the SAS scale.

For analyzing the movements of the participants, we used the tracked position of their wrists, we denote \(x_{ij}\) as the position at timepoint j in task i. For the first 11 tasks, we used the avatar screen coordinate vertical position for the right wrist. The choice of this coordinate is because the avatar has been scaled according to an initial estimate of the player’s arm length, making on-screen positions comparable between players. For the following 11 tasks, we used the corresponding coordinates for the left wrist. For each of the 22 tasks, we used measurements for the first second of play. The first second of the game is enough for the person to respond and start moving. We can see the contrast between a fast and slow reacting participant in Fig. 3. This yields, in the end, a total of \(p=20\times 22=440\) variables per participant. We denote \(m_{i_S}\) as the mean of the first three measurements for task i and \(m_{i_E}\) as the average for the last three measures for task i.

$$\begin{aligned} \tilde{x}_{ji} := \frac{x_{ji}-m_{i_S}}{|m_{i_S}-m_{i_E}|} \end{aligned}$$
(3)

We further scale the j-th measurement \(x_{ji}\) from task i as depicted in Eq. 3. Due to variation in the end and starting position, this scaling ensures that the data is more robust to reactions of the participants.

Fig. 3.
figure 3

Data used for the experiment, vertical position of two subjects’ hands over the first second of the 22 tasks. On the left we have a participant that generally reacts fast, on the right we have a more slow reacting individual.

We normalize the data before applying sparse ordinal regression by subtracting the mean for each variable and scaling the standard deviation to one. We report the balanced accuracy for leave one cross-validation, where we allow the regularization parameters \(\lambda _1\) and \(\lambda _2\) from Eq. 1 to be in the set \(\{0.1,0.01,0.001\}\). We perform this experiment for the four labels shown in Fig. 2. Note that the three variables for MDS-UPDRS were only measured for the Parkinson patients; thus the controls were assumed to have a score of zero.

4 Results

The leave one out cross-validation balanced-accuracy ranged from 38.8% for hand movement right to 48.1% for elbow rigidity. The corresponding confusion matrices are shown in Table 1. We can see that the predictions are somewhat accurate, although the LOO-CV most likely overestimates the real accuracy. Note that the best forecasts for class zero are in MDS-UPDRS 3.4a and SAS-4. We assume that the controls have a score of zero in the MDS-UPDRS variables since they were not measured, this may not be entirely correct, a few individuals in the control group had a score of one for SAS-4.

Table 1. Confusion matrices for predictions (with best performing regularization parameters) from the item left out in the leave one out cross-validation. Most of the predictions are concentrated around the correct label, but most of them have difficulties with the higher labels.

5 Conclusions

We have presented a novel approach for assessing the severity of upper body motor symptoms in PD. The novelty lies in the game-like environment, which has been proven to work both in the clinic, or in the patient’s home and the sparse ordinal regression for prediction the severity of motion disturbances. Longitudinal studies are needed to establish further the potential of this approach. Monitoring the movement in correspondence with the presence of pre-movement related symptoms has potential to create novel tools for early detection of PD.