Abstract
For accurate tumor segmentation in brain magnetic resonance (MR) images, the extreme class imbalance not only exists between the foreground and background, but among different sub-regions of tumor. Inspired by the focal loss [3] that down-weights the well-segmented classes, our proposed Focal Dice Loss (FDL) considers the imbalance among structures of interest instead of the entire image including background. Image dilation is applied to the training samples, which enlarges the tiny sub-regions, bridges the disconnected pieces of tumor structures and promotes understanding on overall tumor rather than complex details. The structuring element for dilation is gradually downsized, resulting in a coarse-to-fine and incremental learning process with the structure of network unchanged. Our experiments on the BRATS2015 dataset achieves the state-of-the-art in Dice Coefficient on average with relatively low computational cost.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Gliomas are the most frequent primary brain tumors in adults [5], and the accurate segmentation of glioma and its sub-regions is crucial in clinical diagnosis, treatment planning, and post-operation evaluation. However, as shown in Fig. 1, the multiclass segmentation of multimodal brain MR images is very challenging. The major obstacle includes the great variance in terms of tumor size, shape, and location, also the extreme class imbalance.
Recently, deep convolutional neural networks (CNNs) have achieved remarkable performance in automatic brain tumor segmentation. Specifically, Pereira et al. [6] trained a 2D CNN on patches with data augmentation. A 3D CNN with multi-scale and multi-stream architecture is performed on patches extracted by nonuniform sampling [1], and followed by a fully connected conditional random field (CRF) to refine segmentation output [2]. Based on the fully convolutional network (FCN) [4], Shen et al. [7] introduced a boundary-aware network to achieve multi-task learning on 2D image slices. Zhao et al. [12] integrated FNNs and CRFs, and trained on both patches and slices in multiple stages. Additionally, three modes are trained on images of axial, coronal and sagittal views respectively, and combined by voting-based fusion strategy.
To sum up, all these methods except [7] operate at the patch level, and balance the data by controlling the sampling rate [1, 2, 6, 12]. Without prior knowledge, it is hard to extract test patches by the same sampling ratio. Moreover, the end-to-end (image to segmentation map) FCN frameworks like [7] are more computationally efficient comparing to the patch-based methods, but fail to handle the imbalance by nonuniform sampling or data augmentation.
To address the challenges above, we propose the Focal Dice Loss inspired by [3] and apply image dilation. To tackle the extreme class imbalance on image slices, our FDL down-weights the well-segmented classes during training. Instead of taking all classes into consideration like focal loss [3], the FDL emphasizes the imbalance among foreground classes. Meanwhile, dilation is applied to the ground truth of training samples that allows the network to learn the complex details of tumor structure in a coarse-to-fine approach. This differs from dilated convolution [11] that enlarges the receptive fields for convolutional kernels.
Our major contributions are as follows: (1) we propose Focal Dice Loss to address the class imbalance for multimodal brain tumor segmentation, and validated on publicly available dataset; (2) to the best of our knowledge, we are the first to apply image dilation to ground truth labels during training with gradually downsized structuring element, which obtains better high-level understanding; (3) we show that the proposed method achieves the state-of-the-art performance in Dice Coefficient on average, and with high computational efficiency.
2 Methodology
We employ the elegant U-Net that takes the full image context into account. As shown in Fig. 2, each block includes 3 convolutional layers of size \(3\,\times \,3\), and each layer followed by ReLU activation and batch normalization. Max-pooling and up-sampling of size \(2\times 2\) are adopted in the two paths. Feature maps from the contracting path are concatenated to the ones in the expanding path.
2.1 Focal Dice Loss for Highly Unbalanced Data
Focal loss [3] based on standard cross entropy, is introduced to address the data imbalance of dense object detection. It is worth noticing that for the brain tumor, the class imbalance exists not only between tumor and background, but among different sub-regions of the tumor (e.g., necrosis and edema in Fig. 1 and Table 1). It is stated by Sudre et al. [10] that with the increasing level of data imbalance, loss functions based on overlap measurements are more robust than weighted cross entropy. Our experiments in the next session also support this argument. Therefore, Dice Coefficient is adopted to focus on the tumor sub-regions.
Balanced Dice Loss. The Dice Coefficient (DICE), also called the overlap index, is a commonly used metric in validating medical image segmentation. For the binary ground truth images of each class, DICE can be written as:
In the above, \(g_{it} \in \{0,1\}\) specifies the ground truth label of class t and pixel i, where N indicates the total number of pixels of the image. Similarly, \(p_{it} \in [0,1]\) denotes the output probability. In practice, the \(\epsilon \) term is adopted to guarantee the loss function stability by avoiding the numerical issue of dividing by 0.
A common method for class imbalance is introducing a weight \(w_{t} \geqslant 0\) for each class t. Therefore, we write the Dice Loss (DL) as:
Focal Dice Loss. As mentioned by [3], the extreme class imbalance overwhelms the cross entropy loss during training. We propose to assign lower weights to the well-segmented classes, and focus on the hard classes with lower DICE.
Formally, a factor \(1/ \beta \) is applied as the power of \(DICE_{t}\) for each class, where the exponent parameter \(\beta \geqslant 1\). We define the Focal Dice Loss (FDL) as:
The following are three properties of the FDL. (1) If a pixel is misclassified to class t with a large \(DICE_{t}\) (i.e., the class is well segmented), then FDL is basically unaffected. On the contrary, if \(DICE_{t}\) is small (i.e., the class is poorly segmented) and a pixel is misclassified, then the FDL will decrease significantly. (2) The exponent parameter \(\beta \) smoothly adjusts the rate where better-segmented classes are lower weighted. FDL is equal to DL when \(\beta = 1\). With the increase in exponent factor \(\beta \), the network focuses more on the poorly segmented classes than the others. (3) Different from focal loss [3], the overlap measurement FDL focus on the object of interest instead of the entire image, which meets the demand of brain tumor segmentation.
The FDL is visualized for several values of \(\beta \in [1,4]\) in Fig. 3. (we found \(\beta =2\) to work best in our experiments). We have validated the FDL in the BRATS2015 dataset, which shows an obvious improvement, especially for the small classes.
2.2 Dilation for Coarse-to-Fine Learning
Dilation. Dilation is one of the operators in the area of mathematical morphology. The effect of this operator on binary or grayscale images is enlarging the boundaries of foreground pixels using a structuring element. Mathematically, the dilation of A by B, denoted \(A \oplus B \), is defined in terms of set operation:
where \(\varnothing \) is the empty set and B is the structuring element, \(\hat{B}\) is the reflection of set B and \((B)_{z}\) is the translation of B by point \(z = ( z_{1}, z_{2})\).
In image processing, one application of dilation is bridging the gaps of disconnected but close components, like broken characters. Similarly, we apply dilation to the ground truth to expanding the objects, and linking the disconnected parts. We aim at higher level feature extraction and therefore compromise on some low-level details in the early training stage.
Dilation on the Ground Truth. In our proposed method, dilation is applied to the binary ground truth images of each foreground class in the training set with a probability ratio \(\alpha \). Figure 4(f) to (j) show that the structuring element for dilation shrinks in size gradually during training, resulting in a coarse-to-fine learning process. Noted that eventually there is no dilation applied (dilation by structuring element in Fig. 4(j) remains no change to images). No dilation is applied to validation or test images in any of the experiments.
After dilation, it is possible that the dilated ground truth overlaps, and pixels (in the overlapping region) classified to all the intersected classes will result in a decrement of the loss function. Under this circumstance, the FDL is able to focus on the classes with lower DICE.
In practice, the dilation has the following properties. (1) It expands the tiny regions and connects the close but separated pieces (Fig. 4(a) to (e)). Therefore, the ground truth of each foreground class shrinks from the dilated coarse features to the original fine labels. It also helps the network to focus on the higher level features. (2) Similar to Dropout that randomly discards units with its connections [9], the stochastic dilation on training labels prevents overfitting because of the dynamic changes during training. (3) The coarse-to-fine interface also boosts the learning speed as well as the training efficiency.
3 Evaluation
Our method has been evaluated on the BRATS2015 dataset. We use HG training set that contains MR images from 220 patients, and for each patient, there are 4 modalities (T1, T1-contrast (T1c), T2, and FLAIR) together with the ground truth. The label contains 5 classes: background, necrosis, edema, non-enhancing and enhancing tumor. The evaluation is performed on three different tumor sub-compartments: (1) the complete tumor (it contains all four tumor sub-regions); (2) the tumor core (it contains all tumor sub-regions except edema); (3) the enhancing tumor structure (it contains only the enhancing tumor sub-region).
Example results. Left to right: (a) Flair, (b) Flair with ground truth, (c) results of our method, (d) U-Net results, (e) Boundary-aware [7] results. Best viewed in color: necrosis (red), edema (yellow), non-enhancing tumor (blue), and enhancing tumor (green). (Color figure online)
In our experiments, the 220 HG images are randomly split into three sets with a ratio of 6:2:2, therefore we have 132 training, 44 validation and 44 testing images. For all MR images, voxel intensities are normalized based on the mean and variance of the training set. We use 2D axial slices from MR volumes as input, and each slice is cropped into 192\(\,\times \,\)200. Besides, the symmetric intensity difference map [8] of each slice is also fed into the network, resulting in 8 input channels. In our experiments, we use exponent factor \(\beta = 2\) and dilation ration \(\alpha = 0.6\). The duration of applying each structuring element in Fig. 4(g) to (j) for dilation is 15 epochs, the matrix in Fig. 4(f) is not used in our experiments. The model is implemented with Keras and Tensorflow backend, and trained for 60 epochs using Adam optimizer, with learning rate \(8\times 10^{-5}\).
The evaluation results of the 44 test images are shown in Table 2 on three tumor sub-regions. The hyper-parameters of mentioned models in Table 2 are identical to the proposed ones. Based on U-Net, the FDL and image dilation shows improvement especially on rather small regions like tumor core and enhancing tumor. It shows the capability of the FDL in improving the accuracy of classes with lower Dice. Our proposed method that combines the FDL and dilation outperforms the other methods in average Dice of three tumor regions. The example results are annotated in Fig. 5. Our method achieves better high-level understanding instead of misled by complex details. [7] generates smooth boundary of entire tumor but not for each tumor sug-regions, and our method also outperforms it on some disconnected components.
Besides the improvement in accuracy, one more advantage of our method is the low computational cost for new test images. Recent methods reported 8 min [6], 2 to 4 min [1], and 2 min [12] respectively for the prediction of each 3D volumes on the modern GPU. Our method just takes around 3 s on the NVIDIA Titan X Pascal, and including image normalization and computing symmetric difference maps.
3.1 Results on the Focal Dice Loss
We have tested the performance of the proposed method with different values of \(\beta \) in the FDL, as shown in Table 3. We plot the dice curves of 44 validation images during training. Noted that the Dice in Figs. 6 and 7 is the average DICE of 4 foreground classes, which differs from our evaluation matrix of 3 regions.
3.2 Results on Dilation
We also conducted experiments to explore the properties of dilation on the ground truth. Table 4 shows that our model works best when \(\alpha = 0.6\). It is worth noticing that the stability of the network is degraded when the dilation rate is 0.45 and 1 in Fig. 7. If the ground truth is dilated by a small ratio (\(\alpha = 0.45\)), the corresponding input images may be considered as noise during training as the occurrence of dilated images is limited. For large dilation rate, like \(\alpha =1\), it is likely that the network experiences great changes when the structuring element is switched to a smaller one and results in the oscillation of Dice curves.
4 Conclusion
We introduced a FDL to address the data imbalance for multimodal brain tumor segmentation, which focuses on different objects of interest instead of the entire image (like focal loss). The experiments shows the capability of FDL in improving the class with lower accuracy. Dilation is also applied to training samples by a gradually downsized structuring elements to enlarge and connect the tiny regions for better high level feature extraction, which is a coarse-to-fine and incremental training approach with the structure of network unaffected. The performance of our method has been tested on the BRATS2015 dataset and achieves the state-of-the-art in Dice Coefficient with relatively low computational cost.
References
Kamnitsas, K., et al.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFS with Gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2980–2988 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016)
Shen, H., Wang, R., Zhang, J., McKenna, S.J.: Boundary-aware fully convolutional network for brain tumor segmentation. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 433–441. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_49
Shen, H., Zhang, J., Zheng, W.: Efficient symmetry-driven fully convolutional network for multimodal brain tumor segmentation. In: ICIP, pp. 3864–3868 (2017)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Zhao, X., Wu, Y., Song, G., Li, Z., Zhang, Y., Fan, Y.: A deep learning model integrating FCNNs and CRFs for brain tumor segmentation. Med. Image Anal. 43, 98–111 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, P., Chung, A.C.S. (2018). Focal Dice Loss and Image Dilation for Brain Tumor Segmentation. In: Stoyanov, D., et al. Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2018 2018. Lecture Notes in Computer Science(), vol 11045. Springer, Cham. https://doi.org/10.1007/978-3-030-00889-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-00889-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00888-8
Online ISBN: 978-3-030-00889-5
eBook Packages: Computer ScienceComputer Science (R0)