Keywords

1 Introduction

Automatic quantitative analysis of brain tumours assists in better and faster diagnosis procedure and surgical planning. Development of accurate and reliable tumour segmentation from multi-modal MRI remains a challenging task due to many sources of variability, including: tumour types, shapes and sizes, intensity and contrast difference in MR images, etc. Classical approaches include Multi Atlas segmentation, probabilistic graphical models like Markov Random Field (MRF) [6] and Conditional Random Field (CRF), Random Forest (RF) [7]. These have been successfully used for the task of tumour segmentation. Methods based on generative models have also been explored [8] for tumour segmentation.

Inspired by the success of deep learning in many tasks related to natural images like semantic segmentation [10], object detection [11], and classification [12], many deep learning based approaches have been proposed for various tasks in medical images like segmentation [13], synthesis [14], and classification [15]. Various CNN architectures have been explored for brain tumour segmentation which either explicitly [9, 16] or implicitly [17, 18] model global and local image context. These architectures either take MR images at multiple resolutions as input [9, 16] or process single resolution MR images at multiple scales [17, 18]. One of the advantages of deep learning based approaches over classical segmentation methods like MRF, RF etc. is that they don’t require any hand-crafted features because the networks are trained in end-to-end manner with appropriate loss functions. In recent BraTS challenges [1], deep learning based approaches have outperformed classical methods.

In this work, we develop a modified version of the popular 3D U-net [13] architecture for brain tumour segmentation task on BraTS 2018 datasets. The U-net architecture has been successfully applied to many medical imaging segmentation tasks, such as liver and lesion segmentation [19], retinal layer segmentation [20], organ segmentation [21] etc. In this paper, the 3D U-net is trained using Categorical Cross Entropy (CCE) loss function on BraTS 2018 training dataset and a curriculum on class weights is employed to address class imbalance [26]. We achieved competitive results on BraTS 2018 [5] validation and testing datasets with Dice scores of 0.788, 0.909, and 0.825 on validation dataset, and 0.706, 0.871, and 0.771 on testing dataset for enhancing tumour, whole tumour, and tumour core, respectively.

Fig. 1.
figure 1

3D U-net CNN architecture takes as input four full 3D MR image sequences, and generates the multi-class segmentation of the tumour into sub-types.

2 Method

A flowchart of the 3D U-net architecture can be seen in Fig. 1. The network takes as input full 3D volumes of all available sequences of a patient and generates multi-class segmentation of tumours into sub-types, at the same resolution. The 3D U-net is similar to the one proposed in [13], with some modifications. The U-net consists of 4 resolution steps for both encoder and decoder paths. At the start, we use 2 consecutive 3D convolutions of size \(3\times 3\times 3\) with k filters, where k denotes the user-defined initial number of convolution filters (10). Each step in the encoder path consists of 2 3D convolutions of size \(3 \times 3 \times 3\) with \(k * 2^n\) filters, where n denotes the U-net resolution step. This is followed by average pooling of size \(2 \times 2 \times 2\). We chose average pooling instead of max pooling as it allows better gradient flow between consecutive layers. At the end of each encoder step, instance normalization [22] is applied, followed by dropout [23] with 0.05 probability. Instance normalization was preferred over batch normalization due to memory constraints, as we were able to fit only one volume at a time in the available GPU memory. In the decoder path at each step, 3D transposed convolution of size \(3 \times 3 \times 3\) is applied, with \(2 \times 2 \times 2\) stride and \(k * 2^n\) filters for the upsampling task. The output of the transposed convolution is concatenated with the corresponding output of the encoder path. We chose transposed convolution as it allows the network to learn an optimal interpolation function instead of a pre-defined interpolation function in the case of standard upsampling. This is, once again, followed by instance normalization and Dropout with 0.05 probability. Finally, 2 3D convolution of size \(3 \times 3 \times 3\) with \(k * 2^n\) filters are applied. Rectified linear unit is chosen as a non-linearity function for every convolution layer. The last layer has C filters, where C denotes the total number of classes. This is followed by SoftMax non-linearity.

2.1 Loss Function

We optimize weighted Categorical Cross Entropy (CCE) loss function during training. The equation for the same is given below.

$$\begin{aligned} CCE^i = -\sum _n w_n^i \sum _l t_{n,l}^{i} \log p_{n,l}^{i} \end{aligned}$$
(1)
$$\begin{aligned} w_n^i = w_l*y_n^i \qquad \text {where, } w_l = (\frac{\sum _{k=0}^{k=C} m_k}{m_l}) * r^{ep} + 1, \end{aligned}$$
(2)

where, \(w_n^i\) and \(w_l\) denote the weight for voxel n of volume i and the weight of class l. \(m_l\) is total number of voxels of \(l^{th}\) class in the training dataset and C denotes the total number of classes. \(w_l\) are decayed over each epoch ep with a rate of \(r \in [0,1] \). It should be noted that \(w_l\) converges to 1 as ep becomes large ensuring that all sample receive an equal weight at the later training stages. This method of weighting classes is known as curriculum class weighting [26].

3 Experiments and Results

3.1 Data

BraTS 2018 Training Set: The BraTS 2018 training dataset is comprised of 210 high-grade and 75 low-grade glioma patient MRIs. For each patient T1, T1 post contrast (T1c), T2, and Fluid Attenuated Inverse Recovery (FLAIR) MR volumes, along with expert tumour segmentation are provided. Each brain tumour is manually delineated into 3 classes: edema, necrotic/non-enhancing core, and enhancing tumour core [1,2,3,4,5].

BraTS 2018 Validation Set: The BraTS 2018 validation dataset is comprised of 66 patient MRIs. For each patient T1, T1c, T2, and FLAIR MR volumes are provided. No expert tumour segmentation masks are provided and the grade of each glioma is not specified [1,2,3,4,5].

BraTS 2018 Testing Set: The BraTS 2018 testing dataset is comprised of 191 patient MRIs. Similar to validation dataset, here for each patient T1, T1c, T2, and FLAIR MR volumes are provided but expert tumour segmentation masks are not provided. The grade of each glioma is also not specified [1,2,3,4,5].

3.2 Pre-processing

The BraTS challenge provides isotropic, skull-stripped, and co-registered MR volumes. We follow this up with a few pre-processing steps. The intensity of volumes were re-scaled using mean subtraction, divided by the standard deviation, and re-scaled from 0 to 1 and were cropped to \(184 \times 200 \times 152\).

3.3 5-Fold Cross Validation

We performed 5-fold cross validation on the training dataset. The BraTS 2018 training dataset is randomly split into five folds with 57 patient dataset each such that each fold contains 42 high-grade patients and 15 low-grade patients. We train our network 5 times such that 4 folds are used to train the network and the remaining fold is used to validate the network.

Please note that we use total five networks, obtained by the corresponding cross-validation, as an ensemble to predict segmentation for BraTS 2018 validation and testing datasets. We view this ensemble as bagging [25], which has been shown to improve performance over a single model.

Parameters. In our network, we used initial number of filters \(k = 20\) and number of filters in the last layer \(C = 4\). We optimize the loss function in Eq. (1) using Adam [24] with a learning rate of 0.001 and batch size of 1. The network is trained for total 240 epochs. Learning rate is decayed by the factor of 0.75 after every 50 epochs. The decay rate r in Eq. (2) is set to 0.95. We regularize the model using data augmentation, where at each training iteration a random affine transformation is applied to the MR volumes and the corresponding segmentation mask. Random translation, rotation, scaling and shear transformations are applied, where the range of transformations is sampled from a uniform distribution of \([-5,5]\), \([-3^{\circ },3^{\circ }]\), \([-0.1,0.1]\), and \([-0.1,0.1]\), respectively. Volumes are also randomly flipped left to right.

Fig. 2.
figure 2

Training (Left) and Validation (Right) Dice Scores as a function of number of epochs for one of the five cross-validation folds.

Learning Curves. Figure 2 shows an example of evolution of various Dice scores (Tumour, Enhance, Core, and Average) as a function of number of epochs for one of the 5 cross-validation fold.

4 Discussion

4.1 Quantitative Results

Our method performed well, resulting in Dice scores of 0.788, 0.909, and 0.825 (BraTS 2018 validation dataset), and 0.706, 0.871, and 0.771 (BraTS 2018 testing dataset) for the enhancing tumours, whole tumours, and tumour cores, respectively. Tables 1, 2, and 3 show the results of our method based on different evaluation metric statistics, provided by the challenge organizers. The results are based on a number of experiments on the following BraTS 2018 datasets: 5-fold cross validation on the training dataset, and tests on the validation dataset and the testing dataset. The results indicate that the proposed method performs very well on the whole tumours and tumour cores, with relatively lower performance for enhancing tumours. This was expected as enhancing tumours rely heavily on the T1c images, and present similarly to other enhancements on those images. For other tumour sub-types, other modalities assist in the segmentation.

Table 1. Evaluation metric statistics for 5-fold cross validation on BraTS 2018 training dataset for enhancing tumour (ET), whole tumour (WT), and tumour core (TC).
Table 2. Evaluation metric statistics for BraTS 2018 validation dataset for enhancing tumour (ET), whole tumour (WT), and tumour core (TC).
Table 3. Evaluation metric statistics for BraTS 2018 testing dataset for enhancing tumour (ET), whole tumour (WT), and tumour core (TC).

4.2 Qualitative Results

Figures 3 and 4 show examples of slices with the resulting segmentation labels for high-grade and low-grade glioma patients from one fold of the experiments on the BraTS 2018 training dataset. We can observe that the network performs much better on high-grade glioma cases. This can be attributed to the fact that we have more training examples of high-grade glioma cases as compared to low-grade glioma cases. Example slices with the predicted segmentation labels on the BraTS 2018 validation and testing datasets can be seen in Figs. 5, 6, and 7.

Fig. 3.
figure 3

Examples of high-grade glioma segmentation results for BraTS 2018 training dataset. On T1c MR volume (Column 1), Expert Segmentation (Column 2) and Predicted Segmentation (Column 3) are overlaid. The green label is edema, the red label is non-enhancing or necrotic tumour code, and the yellow label is enhancing tumour core. (Color figure online)

Fig. 4.
figure 4

Examples of low-grade glioma segmentation results for BraTS 2018 training dataset. On T1c MR volume (Column 1), Expert Segmentation (Column 2) and Predicted Segmentation (Column 3) are overlaid. The green label is edema, the red label is non-enhancing or necrotic tumour code, and the yellow label is enhancing tumour core. (Color figure online)

Fig. 5.
figure 5

Examples of segmentation results for BraTS 2018 validation dataset. On T1c MR volume (Column 1) predicted segmentation (Column 2) is overlaid. The green label is edema, the red label is non-enhancing or necrotic tumour code, and the yellow label is enhancing tumour core. (Color figure online)

Fig. 6.
figure 6

Examples of segmentation results for BraTS 2018 validation dataset. On T1c MR volume (Column 1) predicted segmentation (Column 2) is overlaid. The green label is edema, the red label is non-enhancing or necrotic tumour code, and the yellow label is enhancing tumour core. (Color figure online)

Fig. 7.
figure 7

Examples of segmentation results for BraTS 2018 testing dataset. On T1c MR volume (Column 1) predicted segmentation (Column 2) is overlaid. The green label is edema, the red label is non-enhancing or necrotic tumour code, and the yellow label is enhancing tumour core. (Color figure online)

5 Conclusion

In this work, we demonstrated how a simple CNN network like 3D U-net [13] can be successfully applied for the task of tumour segmentation. U-net process the input multi-modal MR images at multiple scales, which allows it to learn local and global context necessary for tumour segmentation. The network was trained using a curriculum on class weights to address class imbalance, showing competitive results for brain tumour segmentation on BraTS 2018 [5] testing dataset. Our method performed well and we got following Dice scores for enhancing tumour, whole tumour, and tumour core on BraTS 2018 [5] validation and testing datasets: 0.788, 0.909, and 0.825 (validation dataset), and 0.706, 0.871, and 0.771 (testing dataset). But our method showed degradation in performance on the testing dataset in the categories of Enhancing Tumours (ET) and Tumour Core (TC).