1 Introduction

Multiple Myeloma (MM) accounts for 10% of all bone marrow malignancies with an incidence rate of 6/100000 per year in western countries [5]. It is the second most common blood affecting malignancy, which disturbes the generation pathway of plasma cells and B-lymphocytes. Consequently, these cells proliferate uncontrolled and are transformed in a malignant way [10]. In addition, the production of large amounts of non-functional monoclonal antibodies is enforced, which affects the function of kidneys, increases the deficiency in immune response and in an advanced stage, influences the generation of bone forming and resorbing cells. MM starts at a precursor state of Monoclonal Gammopathy of Undertmined Significance (MGUS) and further envolves to an asymptomatic form of the disease smoldering Multiple Myeloma (sMM) with a predictable progression to the symptomatic form of MM [4].

Longitudinal bone infiltration patterns of MM progression. The increased amount of plasma cells in MM leads to the alteration of bone remodelling mechanisms, by promoting bone resorption and inhibiting bone formation [10]. This results first in the formation of focal or diffuse bone marrow infiltration. The gold standard for observing these initial infiltration patterns is MRI (T1, T2) [1, 4, 8]. Subsequently, the progression of the disease leads to the building of osseous destructions, which are observable using low-dose Computer Tomography (CT) [6]. Figure 1 illustrates the infiltration pattern of a focal lesion evolving at the distal part of the femur over three examination time points of a single patient.

Fig. 1.
figure 1

Visualisation of an infiltration pattern of a focal lesion (yellow) using T1 weighted MRI scans over multiple examination time points of one patient.

Challenges. Challenges of tracking lesions over time are, identifying early signatures of their emergence, accurate alignment of subject whole body images, imaging artefacts, and subtle non rigid deformations, as well as capturing the heterogeneity of diffuse infiltration patterns and their imaging signatures. Different treatment strategies and patient specific treatment response, and progression speed cause further variability. According to the results reported in the recent study of Mateos et al. [7], it is particularly important to assess high-risk sMM patients for developing MM and corresponding infiltration patterns, since a benefit for the patient from early therapy is observable.

Contribution. Here, we propose and evaluate a predictor for future bone infiltration patterns. The algorithm uses longitudinal data to learn a local predictor of lesion emergence and change. We assess longitudinal relationships between subsequent stages of bone lesions and corresponding infiltration patterns of MM patients to provide a predictive signature for bone lesions. The contribution of the paper is three fold: (1) the longitudinal alignment of multiple bodyparts in whole body MRIs, (2) a classifier incorporating data from different disease stages in MM and (3) a local lesion risk score (LRS) to identify bone regions with a higher probability to evolve to diffuse or osteolytic lesions. We first give an overview of methodology and the data in Sect. 2. The evaluation results are presented in Sect. 3 and the conclusion of this work and possibilities for future work are summarized in Sect. 4.

2 Methodology

We first describe longitudinal alignment, and then introduce the methodology to estimate a local Lesion Risk Score (LRS) for future lesion emergence. Figure 2 illustrates the computation pipeline of the lesion risk score. It consists of four components: data acquisition, data preprocessing, train and test data design and LRS computation. Details regarding the train and test data design are summarized in Sect. 3.

Fig. 2.
figure 2

Lesion Risk Score computation pipeline

Longitudinal Alignment. To perform subject specific longitudinal analysis of subsequent lesion states, we first register a patient’s data \(I_{t_i} = \{I_{t_1},...I_{t_M}\}\). A patient’s image at a timepoint \(t_i\) is aligned to all subsequent timepoints \(x = t_{i+1},...,t_M\), depending on the number of available data. Bias field correction is used to process imaging data before alignment using FASTFootnote 1 [3] integrated in the FMRIB Software Library (FSL)Footnote 2. The aligned image \(\overline{I_{t_i}(x)}\) is obtained following a two step registration procedure (cf. Eq. 1).

$$\begin{aligned} \overline{I_{t_i}(x)} = I_{t_i} \circ \phi _{NR}((A*I_{t_i}),I_{x}) \end{aligned}$$
(1)

Although longitudinal data is registered, patients’ shape vary over time, since the median inverval between MRIs is 13 months. Also the acquired images do not visualise exactly the same body part snippets. Thus, we decided (based on experimental results), to use an affine approach first. This affine alignment A is performed using a block matching method for global registration. The resulting transformed image (\(A*I_{t_1}\)) is used as moving image in the second alignment step. Subsequently, a non-rigid deformation \(\phi _{NR}\) to the target at time point x is estimated. For affine registration the function reg_aladin and for non rigid alignment the function reg_f3d are used, which are integrated in the NiftyReg toolboxFootnote 3 [9]. For assessing the quality of registration, moving and target image were inspected manually using overlay visualisations and evaluated regarding correspondence of lesions’ position between the different time points.

Patch Creation and Data Augmentation. After alignment, MR images \(\overline{I_{t_1}(x)}\) of a patient correspond to timepoints x and corresponding annotations \(S_x\) of the lesion in a future image of the patient. After the longitudinal alignment of acquisitions of one patient, datasets are created by using the intensity image of a subject at time point t and to this time point aligned annotations of lesions of subsequent time points \(t~+~i\), where i is 1, ..., n (n... number of available subsequent time points). Given those data pairs, patches are created in the lesion regions, by first computing the barycentre of the lesion’s annotation in the subsequent state and randomly moving a clipping window around it to avoid the predictor to learn uniform lesion positions. Additionally, we repeated this procedure for rotated intensity images and corresponding annotations. The rotations were performed in 20 degree steps.

Local Lesion Risk Score. The proposed local Lesion Risk Score (LRS) uses early signatures of emerging bone lesions to predict future lesions and mark corresponding high risk locations. We use the computed pairs of image patches and corresponding lesion annotations of a subsequent state to train a random forest classifier that predicts future lesion labels from the present image patch data. During application, a score is obtained for each voxel position V by the probability predicted by the trained random forest for a new input patch.

$$\begin{aligned} LRS_V = P_{RF} (\overline{I_{t_i}(x)}) \end{aligned}$$
(2)

We used the Python framework sklearn with an integrated Random Forest predictorFootnote 4 with 10 decision trees and the following parametrisation: n_estimators = 10, criterion = ‘gini’, max_depth = 2, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0.0, max_features = ‘auto’, max_leaf_nodes = None, min_impurity_decrease = 0.0, min_impurity_split = None, bootstrap = True, oob_score = False, n_jobs = 1, random_state = None, verbose = 0, warm_start = False, class_weight = None.

3 Results

In this section an overview of the evaluation dataset is given and the quantitative as well qualitative LRS evaluation results are presented and discussed.

Dataset. In this study 220 longitudinal whole body (wb) MRIs from 63 patients with smoldering multiple myeloma (following the 2003 guidelines [2]) were acquired between 2004 and 2011. At least one wbMRI was performed per patient. The annotation of focal lesions is performed manually by medical experts starting at a lesion size bigger than 5 mm [1], since according to the IMWG consensus statement, from this size on, patients are considered to have symptomatic myeloma with therapy requirement. Table 1 summarizes the study participant’s demographics. The protocol of this study was approved by the institutional ethics committee and all subjects gave their informed consent prior to inclusion. The scanning was performed on a 1.5 Tesla Magnetom Avanto (Siemens Healthineers, Erlangen, Germany) scanner. For the T1 weighted acquisition a turbo spine echo sequence (repetition time (TR): 627 milliseconds (ms), echo time (TE): 11 ms, section thickness (ST): 5 mm, acquisition time (TA): 2:45 min) was performed of the head, thorax, abdomen, pelvis and legs using a coronal orientation and for the spine in sagital orientation. No contrast medium was given. The duration of a scan was approximately 40 min long.

Table 1. Participants’ demographics

Train and Test Data Design. In this study we used acquisitions from two body regions: In region 1 thorax, abdomen and pelvis are visualised and in region 2 the lower part of the pelvis, femurs and knees. These areas are considered since most lesions occur in those. In this work we observe two types of lesions and evaluate the performance of the methodology proposed separately for every type, with corresponding train and test sets: lesions which are emerging, i.e. which are not reported in the first scan, but in the subsequent scan, and changing lesions, which are annotated at both observed examination time points. For every patient we extracted image patches at lesion regions longitudinally over subsequent states of three different sizes (10 \(\times \) 4 \(\times \) 10, 20 \(\times \) 4 \(\times \) 20 and 30 \(\times \) 4 \(\times \) 30 voxels with a voxelspacing of 1.302 mm \(\times \) 6 mm \(\times \) 1.302 mm). To obtain a higher number of patches for the predictor training, data augmentation is performed resulting in 18 different patches per lesion. To summarize, for emerging lesions we obtain 720 patches for region 1 and 504 patches for region 2 and for growing lesions we created 1026 patches for region 1 and 810 patches for region 2. Crossvalidation is used to generate test and training datasets, where a testset consists of 18 patches of a single lesion including the volumes of different orientation, which results in 40 folds for emerging lesions in region 1 and 28 in region 2 and 57 folds for growing lesion in region 1 and 45 in region 2.

Evaluation Setup. For the quantitative evaluation and for obtaining comparability between the different tested setups, the Area Under the Curve (AUC) is computed, based on the probability estimates of the local lesion risk predictor for the test patch using scikit learnFootnote 5. We used thresholding to obtain a predicted label for visualisation and comparison. We have to point out that an exact matching of predicted label and subsequent annotation is not achievable, given a pre-stage and not required since we aim to predict risk of lesion growth or emergence and not to estimate exact segmentations of future lesions.

Table 2. Summary results LRS performance
Fig. 3.
figure 3

Prediction of lesion growth from examination time point to time point in body region 1. The predicted label is visualised in the second column, below the underlying Local Lesion Risk Score probability map is shown and the manual annotation is visualised in the third column.

3.1 Evaluation Body Region 1

In Table 2 in column three the mean AUC for emerging and growing lesion types for body region 1 are summarized. For every lesion type three different patch sizes are evaluated. Figure 3 illustrates a prediction result for a growing lesion of a region 1 acquisition. The test image (left) is a transformed image from examination time point 003 to 004 using the longitudinal alignment approach introduced in Sect. 2. The extracted patch of this image in the region of the lesion visible in the image I\(_{003}\) is visualised in the first row in the center, with the predicted label in the second column and the annotation of the future lesion position extracted from image I\(_{004}\) in the third column. In the second row the predicted probability map of the local lesion risk score is visualised, where yellow shows regions of high probability and blue of low probability.

3.2 Evaluation Body Region 2

In Table 2 in column four the mean AUC for emerging and growing lesion types for body region 2 are summarized. For every lesion type three different patch sizes are evaluated. Figure 4 illustrates a prediction result for an emerging lesion of a region 2 acquisition. The test image (left) is a transformed image from examination time point 002 to 003. The extracted patch of this image in the region of the lesion visible in the target image I\(_{002}\) (right) is visualised, with the predicted label in the second column and the annotation of the future lesion position extracted from image I\(_{003}\) in the third column. In the second row the predicted probability map of the local lesion risk score is visualised, where yellow shows regions of high probability and blue of low probability.

Fig. 4.
figure 4

Prediction of an emerging lesion from examination time point 002 to time point 003 in body region 2. The predicted label is visualised in column 2, below the underlying Local Lesion Risk Score probability map is shown and the manual annotation is visualised in column 3.

3.3 Discussion

For both lesion types a decrease of the mean AUC is observable with increasing patch size, where emerging lesions show a steeper decrease as growing lesions in both bodyparts. Also growing lesions in the femur, knees or pelvis are better predicted than those in the thoracic or abdominal body region.

4 Conclusion

In this work we present a local Lesion Risk Predictor for accessing and visualising regions of high risk for bone lesions to emerge or to grow. We trained a random forest predictor using lesion image patches and annotations of subsequent lesions states of the longitudinal MR T1 weighted dataset observed. The main challenge here was to achieve an accurate longitudinal alignment of subsequent examination time points of a patient. To our knowledge this is the first attempt to predict bone infiltration patterns in MM using T1 weighted MR images. Current approaches focus on lesion detection (e.g. [11] for CT images) using deep learning techniques. We decided to investigate a lesion predictor based on a random forest classifier first, since its setup, parametrisation and evaluation is simpler compared to deep learning approaches. However, we incorporated the possibility to extend the proposed patch based approach to evaluate deep architectures for lesion prediction in the future. To this point prediction using the introduced local lesion risk score is limited to image patches. For future work we aim to adapt the proposed score to be able to predict probability maps for entire volumes. Additionally, we will incorporate different modalities, and data from additional bodyparts in the framework proposed to longitudinally model infiltration and also osseous destruction patterns caused by the progress of multiple myeloma.