Abstract
The growth of lesions and the development of new lesions in MRI are markers of new disease activity in Multiple Sclerosis (MS) patients. Successfully predicting future lesion activity could lead to a better understanding of disease worsening, as well as prediction of treatment efficacy. We introduce the first, fully automatic, probabilistic framework for the prediction of future lesion activity in relapsing-remitting MS patients, based only on baseline multi-modal MRI, and use it to successfully identify responders to two different treatments. We develop a new Bag-of-Lesions (BoL) representation for patient images based on a variety of features extracted from lesions. A probabilistic codebook of lesion types is created by clustering features using Gaussian mixture models. Patients are represented as a probabilistic histogram of lesion-types. A Random Forest classifier is trained to automatically predict future MS activity up to two years ahead based on the patient’s baseline BoL representation. The framework is trained and tested on a large, proprietary, multi-centre, multi-modal clinical trial dataset consisting of 1048 patients. Testing based on 50-fold cross validation shows that our framework compares favourably to several other classifiers. Automated identification of responders in two different treated groups of patients leads to sensitivity of 82% and 84% and specificity of 92% and 94% respectively, showing that this is a very promising approach towards personalized treatment for MS patients.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Multiple Sclerosis (MS) is an inflammatory, demyelinating disease of the central nervous system which commonly affects young adults, with no currently known cure [1]. Magnetic Resonance Imaging (MRI) has been used to diagnose and monitor disease activity and progression, as one of the hallmarks of the disease includes the presence of lesions which are visible in MRI. The number of new or enlarged T2 lesions is a marker of MS activity and the volume of lesions is often used to quantify accumulated “disease burden” [2]. For relapsing remitting MS (RRMS), these measurements have become essential in the evaluation of new treatments through clinical trials; therapy success is often measured through reduction in the number of new lesions. Hence, predicting future lesion activity in MRI could lead to a better understanding of disease worsening, and also to evaluating treatment efficacy in clinical trials. However, automatic prediction is challenging as the MRI of patients with MS presents wide variability across the population, with varying number, sizes and shapes of lesions throughout the brain. The effects of these variable lesion characteristics have a totally unknown effect on patients’ outcomes, making this context a perfect candidate for automatic data mining and machine learning techniques. The longitudinal course of RRMS is also highly variable across the population, resulting in new lesions that can appear and disappear, grow or remain stable over time, for reasons that are not well understood. As a result, a unified static and dynamic model across the population is difficult to develop. While the detection of Gadolinium-enhancing lesions has been shown to be a good indicator that a patient’s disease is currently active, administering Gadolinium has important side effects for patients. Several automatic prediction methods predict the conversion of patients with preliminary symptoms to MS rather than predict dynamics of patients known to have the disease, using logistic regression on a number of clinical indicators [3] and more recently, deep learning methods [4, 5]. Other efforts have predicted long-term clinical effects [6].
In this work, we develop a fully automatic, probabilistic machine learning framework to model the variability of lesions in the multi-modal MRI of patients with RRMS with the objectives of: (1) automatic identification of lesion types across the population, (2) probabilistic prediction of new lesion activity in patients two years in the future based only on baseline multi-modal MRI and (3) automatic identification of responders to treatment using lesion activity prediction learned for untreated and treated groups. Leveraging the success of the Bag-of-Words model in performing unsupervised categorization in the field of computer vision [7], we develop a novel unsupervised Bag-of-Lesions (BoL) model for brain image representation in the context of MS. The method first clusters previously labelled lesions based on a variety of image-based features (e.g. textures, prior tissue atlas). This leads to a codebook for lesion types. Lesions are represented probabilistically over codewords, and patients are represented as a “Bag-of-Lesions”, based on probabilistic lesion codeword histograms. This permits the automatic unsupervised grouping of images through histogram clustering. Experiments on a proprietary dataset of 1048 patients, acquired during a large, multi-center, multi-scanner clinical trial, show that the BoL representation at baseline, combined with a random forest classifier, can be used to accurately predict future patient lesion activity two years in the future, where activity is defined as the presence of new or enlarged T2 lesions. In 50-fold cross-validation, our results compare favourably to Support Vector Machines (SVM) and Nearest Neighbour classifiers, as well as a simpler Naive Bayesian classifier based on counts of lesions of different sizes. We also use this framework to automatically identify responders in two different treated groups of patients, with sensitivity of 82% and 84% and specificity of 92% and 94% respectively.
2 Proposed Method
Each RRMS patient presents at baseline with a set of multi-channel MRI, \(\varvec{I}\), and a set of L coarsely labelled lesions obtained automatically through an algorithm (e.g. [8]) or manually. Our first objective is to model the variability of lesions, and develop a robust, data-driven categorization of lesions into a finite set of types. In order to obtain such a representation, lesions are first divided into coarse size bins. Each lesion is then described by a set of vector-valued, intensity-based features \(f_x\). In this work, we use four different kinds of features: RIFT [10] and Local Binary Pattern (LBP) [11] at varying window sizes to encode the texture of the lesion and surrounding tissues, a probabilistic healthy tissue class prior to encode tissue context (represented by a mean and variance of healthy tissue prior probabilities from an atlas over the voxels labelled as lesion), and intensity features (mean and variance of the intensity of the lesion voxels). Other features can be added as desired. Lesion features are binned according to size groups, and modelled using a Gaussian mixture model (GMM), whose components have full covariance matrixFootnote 1. The mixture is learned in standard fashion using Expectation Maximization (EM). Bayesian Information Criterion (BIC) is used to determine the number of mixture components, \(n_x\). We refer to the components of these GMMs, denoted \(f_{x,j}, j=1\dots n_x\), as feature-types.
For lesion \(L_i\), let \(f_{x}(\varvec{I}, L_i), x=1, \dots 4\) denote the features extracted for this lesion, and let \(c(x,j,i)=P(f_{x,j}|f_{x}(\varvec{I}, L_i)), j=1\dots n_x\). We construct the Cartesian product of the feature types (which has \(\prod _{x=1}^4 n_x\) elements). We consider each of these elements a lesion type. For each element \((j_1, \dots j_N)\), the product \(c(x,j_1 ,i)\cdot c(x,j_N ,i)\) represents the codeword of lesion i corresponding to feature vector x. The use of this product encodes a conditional independence assumption: feature types are considered conditionally independent given the lesion. We then collect all codewords for all lesions. Finally, a patient’s representation is a probabilistic histogram of the lesion-types present in their brain scans, referred to as a Bag of Lesions representation (by analogy to the Bag of Words representation used in text and image processing). An overview of our framework can be found in Fig. 1.
(a) Learning the Bag of Lesions from Training Data. Lesions are first separated by size. Features (e.g. RIFT) are extracted from each lesion in the database. Each feature is modelled as a separate GMM, with each component referred to as a feature-type. Each lesion codeword is the combination of feature-types. (b) Representing a new patient. The lesion codeword is determined for all patient lesions. The patient is represented by a probabilistic histogram of lesion-types.
As patients are represented as a distribution of lesion-types, groups of similar patients can also be found by automatic clustering using EM. The optimal number of groups can be selected automatically using the Bayesian Information Criterion (BIC). We compute the likelihood that a new test patient is part of a group based on their BoL by computing the Mahalanobis distance to each group. In this way we automatically learn patterns of lesion presentation across the population.
2.1 Activity Prediction
The appearance of new lesions or enlargement of existing lesions can be used as a biomarker for focal inflammatory activity, which is associated with relapses in RRMS. We seek a probabilistic prediction of future activity, based on the baseline BoL representation, \(P(A=1|BoL)\). We train a random forest classifier to predict MS activity based on the BoL representation P(A|BoL) with different sets of lesion-types. The lesion-types are progressively eliminated using a backward elimination method, removing 20% of the least informative remaining types (as determined by the Gini impurity across all nodes of all trees) at each iteration and evaluating prediction accuracy on a retrained random forest [9]. The lesion-types that result in the highest prediction accuracy are preserved in the final model. The final prediction is computed by averaging the activity probability predicted by each tree. Because the dataset is imbalanced, with many fewer patients being inactive, the training error weights the two types of misclassification differently, accounting for the proportion of examples in each class.
2.2 Identifying Responders to Treatment
Ground truth information regarding which patients in a treatment group have definitively responded to treatment is rarely available. In this work, responders to treatment are defined as patients predicted, with high confidence, to have new lesions or lesion growth two years from baseline if not treated, but instead had no lesion activity. This can act as a proxy for ground truth, based on the assumption that treatment must have halted the activity of the disease. To achieve this goal, we fit activity prediction models for the untreated and treated populations separately. To identify whether a new patient is a responder to a drug, we compute the patient’s probability of future activity, \(P_{untr}(A=1|BoL)\) using the "untreated" model from the Bag-of-Lesion representation computed from the baseline MRI, and the probability \(P_{treat}(A=0|BoL)\) using the model computed from treated patients. A patient is considered a responder if these probabilities exceed thresholds \(\alpha \) and \(\beta \) respectively, essentially stating that the two models disagree with high confidence.
3 Experiments and Results
In order to validate the framework for characterizing lesion types and patient groups, for predicting future lesion activity and for classifying responders to treatment, we conducted experiments using a large, proprietary dataset of real MS patient brain images from a multi-centre, multi-scanner clinical trial. The data contained 1048 RRMS patients, each with 4 MR image sequences available: T1, T2, PD, and FLAIR. Each volume was at a resolution of 1 mm\(\,\times \,\) 1 mm\(\,\times \,\)3 mm. Pre-processing included brain extraction [12], bias field inhomogeneity correction using N3 [13], Nyul image intensity normalization, and registration of all images to MNI-space. Included with the clinical trial dataset were: (1) T2 lesion label masks for each patient at baseline, (2) New disease activity labels for each patient, defined as the presence of any new or enlarging T2 lesions 24 months from baseline. The T2 lesion masks provided were obtained through a semi-manual process whereby a trained expert reader corrected an in-house automated segmentation result. The new and enlarged T2 lesion masks provided were obtained through expert validation of an automatic longitudinal MS lesion segmentation framework [14]. Patients were treated in a double-blind study with either a placebo or one of two drugs, divided as follows: 259 Untreated (placebo), 280 Drug A, 259 Drug B. The trial did not achieve its primary endpoint (due to insufficient evidence of effectiveness across the entire cohort). However, there was a clear trend towards a treatment response for some patients in the trial, rendering the task of automatically finding responders at once challenging and compelling for this dataset.
A total of 98,106 lesions were used to build a comprehensive lesion codebook. According to clinical protocol, lesions of less than three voxels were omitted. Lesions were subdivided into four coarse size groups: tiny (3–10 voxels), small (11–25 voxels), medium (26–100 voxels), and large (101+ voxels). For each lesion, RIFT features were extracted at three scales (3 mm, 6 mm, 9 mm) with eight bins for gradients in two dimensions. LBP features were obtained by binarizing intensity differences around central voxels at fixed radii (1 mm, 2 mm, 3 mm). As such, RIFT and LBP captured the textures of the lesions and their surrounding tissues (e.g. see Fig. 1), overcoming any minor under/over-segmentation in the lesion labelling. Probabilistic healthy tissue context was obtained through registration to MNI-space, leading to prior probabilities of white matter (WM), gray matter (GM), cerebral spinal fluid (CSF), partial volume (PV) (at the interface of GM and CSF). The mean and the variance of the probabilities at the lesion voxels are taken as the features. Intensity was encoded as the mean and variance of the intensity of each of the image modalities across each lesion. Examples of lesions drawn from several types are shown in Fig. 2. Patients clustered automatically based on their BoL representation were found to exhibit similar lesion distributions (See Fig. 3).
3.1 Disease Activity Prediction
Each of the MS patient multi-channel volumes in the clinical trial dataset were considered as baseline acquisitions from which BoL representations were inferred. The emergence of new and enlarging lesions 2 years after baseline was additional information provided for all but 250 patients (as they did not complete the study). These markers were used as indicators of future disease activity. A random forest classifier was used for optimal lesion-type selection and for the prediction of P(A|BoL). 50-fold cross validation experiments were performed on the untreated (placebo) dataset. Figure 4 shows the maximum likelihood random forest results in comparison with several classifiers: (1) Nearest Neighbour (NN), where activity was assigned based on the closest training case, as defined by three different distance metrics (Euclidean, Mahalanobis, and \(\chi ^2\)).Footnote 2 the (2) Support Vector Machines (SVM)Footnote 3, using linear, RBF and \(\chi ^2\) kernels. (3) Naive Bayes classifier, based solely on the number of lesions in each size bin, in order to explore whether this was the dominating factor in our framework. Both NN and SVM were based on the BoL representations. The random forest classifier (\(\alpha =0.5\)) performed favourably against the other methods overall, with mean values of 70% sensitivity and 58% specificity (for \(A=1\)). All methods based on the BoL representation outperformed the Naive Bayesian method. When considering only activity predictions with high probabilities (above \(\alpha = 0.8\)), sensitivity increased to 94%. However, the specificity dropped substantially, partially because there were only 14 inactive cases at that threshold.
Comparison of disease activity prediction results based on a 50-fold cross-validation on the placebo dataset: 3 Nearest Neighbour (NN) methods, 3 Support Vector Machines (SVM), proposed Random Forest classifier (\(\alpha =0.5\)), and Naive Bayesian classifier trained only on the number of lesions of each size.
Two treated groups of patients were available for training and testing a separate activity prediction model under the effects of treatment after baseline. The results for 50-fold cross validation using the random forest classifier on the treated cases, the sensitivities increased to close to 1 for both treatments at high probability thresholds (\(\beta =0.8\)), with specificities at around 0.5. Interestingly, when patients in the treated groups were tested using the untreated model, this led to a decrease in specificity by 7% for both treatments (\(\alpha =0.5\)), due to an increase in false positive predictions. This indicates the effectiveness of the treated patient prediction model and, for some patients, the treatment seems to be effectively halting the formation of new or enlarged lesions.
3.2 Responder Identification
We define “responders” (\(R=1\)) as those patients in the treated group whose baseline scans lead to high probability (\(\alpha = 0.8\)) in predicted activity two years later under the untreated model, where they have a known outcome of inactive (i.e. no new or enlarged T2 lesions). At this probability threshold, the sensitivity for detecting activity in the untreated patients is at 98%. Using this definition, there were 25 responders in the Drug A treatment arm and 24 responders in the Drug B treatment arm. Table 1 shows the results of responder classification for two different treatments in the clinical trial dataset, when the probability thresholds are set to high (\(\alpha =\beta =0.8\)). The results indicate that the treatment can be reliably predicted to work on a small subset of patients, even though the overall objectives of the clinical trial were not met.
4 Conclusion
In this paper, we introduce a fully automatic, probabilistic framework for the prediction of future MS disease activity in patients based on a new Bag-of-Lesions representation of their scans at baseline. We develop a probabilistic codebook of distinct lesion types across the population, and show how those lesion types can be used to separate patients into groups that present similar lesion patterns. Additional clinical validation is required to determine how this translates into discoveries of natural patterns of MS disease variability. The activity prediction is then used to automatically identify potential responders to two treatments in the context of a real, large, multi-centre, multi-scanner clinical trial for RRMS patients, showing sensitivities of 82% and 84% and specificities of 92% and 94% respectively. This suggests the possibility of a tool for personalized treatment for new MS patients, and for assessing treatment efficacy.
Notes
- 1.
In particular, given our features, we have 4 GMMs: for RIFT, LBP, prior and intensity features.
- 2.
The Mahalanobis distance normalizes distance based on the covariance matrix of the lesion types, and the \(\chi ^2\) distance measures the distance between histograms.
- 3.
Using the scikit-learn 0.18 package, which wraps the libsvm implementation.
References
Gold, R., et al.: Placebo-controlled phase 3 study of oral BG-12 for relapsing multiple sclerosis. New Engl. J. Med. 367(12), 1098–1107
Brown, J.W.L., Chard, D.T.: The role of MRI in the evaluation of secondary progressive multiple sclerosis. Expert Rev. Neurother. 16(2), 157–171 (2016)
Barkhof, F., et al.: Comparison of MRI criteria at first presentation to predict conversion to clinically definite multiple sclerosis. Brain 120(11), 2059–2069 (1997)
Brosch, T., Yoo, Y., Li, D.K.B., Traboulsee, A., Tam, R.: Modeling the variability in brain morphology and lesion distribution in multiple sclerosis by deep learning. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 462–469. Springer, Cham (2014). doi:10.1007/978-3-319-10470-6_58
Yoo, Y., et al.: Deep learning of brain lesion patterns for predicting future disease activity in patients with early symptoms of multiple sclerosis. In: International Workshop Large-Scale Annotation of Biomedical Data, pp. 86–94 (2016)
Popescu, V., et al.: Brain atrophy and lesion load predict long term disability in multiple sclerosis. J. Neurol. Neurosurg. Psych. 84(10), 1082–1091 (2013)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006). doi:10.1007/11744085_40
Shiee, N., et al.: A topology-preserving approach to the segmentation of brain images with multiple sclerosis lesions. NeuroImage 49(2), 1524–1535 (2010)
Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)
Lazebnik, S., et al.: A sparse texture representation using local affine regions. PAMI 27, 1265–1278 (2005)
Ahonen, T., et al.: Face description with local binary patterns: application to face recognition. PAMI 28(12), 2037–2041 (2006)
Smith, S.M.: Fast robust automated brain extraction. Hum. Brain Mapp. 17(3), 143–155 (2002)
Sled, J.G., et al.: A nonparametric method for automatic correction of intensity nonuniformity in MRI data. TMI 17(1), 87–97 (1998)
Elliott, C., et al.: Temporally consistent probabilistic detection of new multiple sclerosis lesions in brain MRI. TMI 32(8), 1490–1503 (2013)
Acknowledgements
This work was supported by the Canadian NSERC Discovery and CREATE grants. We would like to thank Drs. Narayanan and Maranzano for their clinical advice, and Mr. A. Zografos Caramanos for data preparation. All patient MRI are courtesy of NeuroRx Research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Doyle, A., Precup, D., Arnold, D.L., Arbel, T. (2017). Predicting Future Disease Activity and Treatment Responders for Multiple Sclerosis Patients Using a Bag-of-Lesions Brain Representation. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D., Duchesne, S. (eds) Medical Image Computing and Computer Assisted Intervention − MICCAI 2017. MICCAI 2017. Lecture Notes in Computer Science(), vol 10435. Springer, Cham. https://doi.org/10.1007/978-3-319-66179-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-66179-7_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66178-0
Online ISBN: 978-3-319-66179-7
eBook Packages: Computer ScienceComputer Science (R0)