Knowledge transfer between brain lesion segmentation tasks with increased model capacity

https://doi.org/10.1016/j.compmedimag.2020.101842Get rights and content

Highlights

  • We address the problem of scarce annotated data for brain lesion segmentation.

  • Knowledge transfer between brain lesion segmentation tasks is proposed.

  • A fine-tuning strategy with increased model capacity is developed.

  • We also introduce a spatially adaptive mechanism for the model capacity increase.

  • Our method achieves better performance on ischemic stroke lesion segmentation.

Abstract

Convolutional neural networks (CNNs) have become an increasingly popular tool for brain lesion segmentation in recent years due to its accuracy and efficiency. However, CNN-based brain lesion segmentation generally requires a large amount of annotated training data, which can be costly for medical imaging. In many scenarios, only a few annotations of brain lesions are available. One common strategy to address the issue of limited annotated data is to transfer knowledge from a different yet relevant source task, where training data is abundant, to the target task of interest. Typically, a model can be pretrained for the source task, and then fine-tuned with the scarce training data associated with the target task. However, classic fine-tuning tends to make small modifications to the pretrained model, which could hinder its adaptation to the target task. Fine-tuning with increased model capacity has been shown to alleviate this negative impact in image classification problems. In this work, we extend the strategy of fine-tuning with increased model capacity to the problem of brain lesion segmentation, and then develop an advanced version that is better suitable for segmentation problems. First, we propose a vanilla strategy of increasing the capacity, where, like in the classification problem, the width of the network is augmented during fine-tuning. Second, because unlike image classification, in segmentation problems each voxel is associated with a labeling result, we further develop a spatially adaptive augmentation strategy during fine-tuning. Specifically, in addition to the vanilla width augmentation, we incorporate a module that computes a spatial map of the contribution of the information given by width augmentation in the final segmentation. For demonstration, the proposed method was applied to ischemic stroke lesion segmentation, where a model pretrained for brain tumor segmentation was fine-tuned, and the experimental results indicate the benefit of our method.

Introduction

Automated brain lesion segmentation has a great potential in guiding clinical diagnosis and treatment strategies. In particular, convolutional neural networks (CNNs) have achieved state-of-the-art segmentation performance. For example, Zhao et al. (2018) have proposed a unified CNN-based framework for brain tumor segmentation with appearance and spatial consistency. In Nair et al. (2020), multiple sclerosis (MS) lesions are segmented with CNNs and lesion-level uncertainties are explored. Kamnitsas et al. (2017) have developed an efficient multi-scale CNN, which achieves remarkable segmentation performance for traumatic brain injuries, brain tumors, and stroke lesions. Kervadec et al. (2019) have developed a CNN integrated with a boundary loss to improve stroke lesion segmentation and white matter hyperintensity (WMH) segmentation. However, the training of CNNs generally requires a large number of annotations, which can be costly for medical imaging. In many scenarios, only a few annotated images are available. Thus, it is highly desirable to develop CNN-based brain lesion segmentation methods that allow efficient network training with scarce annotated data.

Transfer learning is a commonly used strategy to address the problem of scarce annotated training data for deep networks, where knowledge learned from a source task with abundant annotations is transferred to a target task of interest with scarce annotations. Normally, the source task and the target task are different yet relevant. Fine-tuning is a typical and widely used strategy of transfer learning, where a model is pretrained for the source task and then the knowledge learned for the source task is transferred to the target task by fine-tuning the pretrained model with the limited annotations from the target dataset (Girshick et al., 2014). In the process of fine-tuning, parameters that are specific to the target task, such as the network weights in the last classification layer, are randomly initialized, and they are optimized together with the other pretrained parameters. This fine-tuning strategy has been successfully applied to image classification and segmentation tasks. For example, in Agrawal et al. (2014), the effectiveness of fine-tuning a network is shown, where the network is pretrained with the ImageNet dataset (Deng et al., 2009) and fine-tuned for image classification and object detection on the SUN dataset (Xiao et al., 2010) and PASCAL VOC dataset (Everingham et al., 2010). In Tajbakhsh et al. (2016), based on distinct medical imaging applications that include classification, detection, and segmentation, the impact of fine-tuning on the performance of CNNs is investigated, and the possibility of knowledge transfer from natural images to medical images is demonstrated. In Ghafoorian et al. (2017), fine-tuning is used for domain adaptation between two WMH datasets with different intensity distributions, and it has achieved accurate WMH segmentation with only a few annotated training scans for the target task.

The transfer learning strategy is also possible for brain lesion segmentation when only limited annotations are available for a task of interest, because there are publicly available datasets that include a decent number of scans with annotated brain lesions (Menze et al., 2015, Maier et al., 2017, Kuijf et al., 2019). However, despite the widespread use of the classic fine-tuning strategy described above, it is observed that this strategy is suboptimal for image classification. Classic fine-tuning tends to make small modifications to the pretrained model, which could hinder its adaptation to the target task (Wang et al., 2017). Instead, an increase in the network capacity during fine-tuning can improve the performance for the target classification task. In Wang et al. (2017) width augmented and depth augmented networks with additional randomly initialized units are proposed. Such an increase in model capacity is inspired by developmental learning in cognitive science (Luciana and Nelson, 2001), and it allows existing units to better adapt to the target task. The strategy of increasing model capacity may also benefit brain lesion segmentation problems, yet this has not been previously explored. In addition, existing strategies of model capacity increase are developed for image classification and they may not be optimal for image segmentation problems.

In this paper, to address the problem of scarce annotated training data for brain lesion segmentation, we explore the strategy of fine-tuning with increased model capacity for knowledge transfer between tasks of brain lesion segmentation. In particular, we focus on the situation where only a pretrained model is available and access to the training data for the source task is not guaranteed, which is likely to happen due to privacy or other practical concerns (Burton et al., 2015, Micaelli and Storkey, 2019). In this setting, multi-task learning (Caruana, 1997) or retraining a different model for the source task is not feasible. Since it has been shown in Wang et al. (2017) that width augmentation is superior to depth augmentation, we focus on the development of width augmented networks. First, we develop a vanilla width augmentation strategy motivated by Wang et al. (2017) for brain lesion segmentation, where the number of channels in the penultimate layer—the layer before the final prediction layer—is increased during fine-tuning. Moreover, unlike image classification problems, in brain lesion segmentation each voxel is associated with a labeling result, and different spatial locations may require different contributions of information from the augmented units. Therefore, we further develop a spatially adaptive width augmentation strategy. Specifically, we propose a module that computes a spatial map of the contribution of information from the augmented units that should be used at each voxel. In this way, the spatially adaptive strategy of model capacity increase is more suitable for brain lesion segmentation than the vanilla width augmentation strategy.

The contributions of our work are summarized as follows:

  • We investigate knowledge transfer with increased model capacity between two different brain lesion segmentation tasks, which has not been explored previously. Compared with the classic fine-tuning method, this strategy enables the segmentation network to better adapt to the target task.

  • We further propose a spatially adaptive mechanism of model capacity increase, which takes into consideration the fact that each voxel is associated with a labeling result and thus is more appropriate for segmentation problems.

  • We show experimentally the benefit of the proposed strategy of fine-tuning with increased model capacity. Specifically, we applied the proposed method to the segmentation of ischemic stroke lesions, where a model was pretrained using the BraTS dataset (Menze et al., 2015) for brain tumor segmentation and then fine-tuned with annotated stroke lesions. Results indicate that the segmentation accuracy is improved with the proposed method.

The rest of this paper is organized as follows. Section 2 introduces the proposed strategy of knowledge transfer between brain lesion segmentation tasks. Section 3 provides experimental evidence that demonstrates the benefit of the proposed approach. In Section 4, discussion on the results and future work is given. Section 5 gives a summary of this paper.

Section snippets

Methods

In this section, we first formulate the problem of knowledge transfer between brain lesion segmentation tasks and introduce the classic fine-tuning strategy. Then, we describe the proposed method of fine-tuning with increased model capacity, which addresses the limitations of classic fine-tuning. Finally, we present the details of implementation.

Results

In this section, we present the evaluation of the proposed approach. For demonstration, the proposed method was applied to ischemic stroke lesion segmentation, which is the target task. We selected brain tumor segmentation as the source task due to the publicly available tumor annotations in the BraTS dataset (Menze et al., 2015). All experiments were performed on an NVIDIA Tesla V100 GPU.

Discussion

For brain lesion segmentation tasks, the availability of a large number of annotated training scans may not always be guaranteed. In such cases, fine-tuning a model that is pretrained with abundant annotations for a different yet relevant task could substantially improve the segmentation performance. However, since classic fine-tuning may limit the adaptation of the pretrained model to the task of interest, motivated by Wang et al. (2017) we explore model capacity increase using network width

Conclusion

We have explored knowledge transfer between tasks of brain lesion segmentation for improved segmentation performance when a limited number of annotated training scans are available. Specifically, a fine-tuning strategy with network augmentation is developed, where the network width is increased during fine-tuning. In addition, a spatially adaptive mechanism is proposed to allow a more flexible use of the augmented information. Using a model pretrained by publicly available brain tumor

Authors’ contribution

Yanlin Liu: writing – original draft, methodology, software, validation, investigation, formal analysis, visualization. Wenhui Cui: writing – original draft, methodology, validation. Qing Ha: investigation, resources. Xiaoliang Xiong: investigation, resources. Xiangzhu Zeng: conceptualization, investigation, data curation. Chuyang Ye: conceptualization, methodology, writing – review & editing, supervision, project administration, funding acquisition.

Acknowledgment

This work is supported by the Beijing Natural Science Foundation (L192058 and 7192108) and Beijing Institute of Technology Research Fund Program for Young Scholars.

References (43)

  • R. Caruana

    Multitask learning

    Mach. Learn.

    (1997)
  • X. Chen et al.

    Catastrophic forgetting meets negative transfer: batch spectral shrinkage for safe transfer learning

    Adv. Neural Inform. Process. Syst.

    (2019)
  • Ö. Çiçek et al.

    3D u-net: learning dense volumetric segmentation from sparse annotation

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2016)
  • W. Cui et al.

    Semi-supervised brain lesion segmentation with an adapted mean teacher model

    International Conference on Information Processing in Medical Imaging

    (2019)
  • J. Deng et al.

    Imagenet: a large-scale hierarchical image database

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2009)
  • M. Everingham et al.

    The pascal visual object classes (VOC) challenge

    Int. J. Comput. Vision

    (2010)
  • M. Ghafoorian et al.

    Transfer learning for domain adaptation in MRI: application in brain lesion segmentation

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2017)
  • R. Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2014)
  • H. Kervadec et al.

    Boundary loss for highly unbalanced segmentation

    International Conference on Medical Imaging with Deep Learning

    (2019)
  • D.P. Kingma et al.

    Adam: A Method for Stochastic Optimization

    (2014)
  • J. Kirkpatrick et al.

    Overcoming catastrophic forgetting in neural networks

    Proc. Natl. Acad. Sci. U.S.A.

    (2017)
  • Cited by (4)

    • Missing Data Imputation via Conditional Generator and Correlation Learning for Multimodal Brain Tumor Segmentation

      2022, Pattern Recognition Letters
      Citation Excerpt :

      Section 6 concludes this work. Over the past few years, many conventional [6–9] and deep learning [10–15] based approaches have been proposed to automatically segment brain tumors in MRI. These methods apply the full modalities to do the segmentation.

    • Volumetric white matter tract segmentation with nested self-supervised learning using sequential pretext tasks

      2021, Medical Image Analysis
      Citation Excerpt :

      Thus, more advanced strategies of knowledge transfer could be explored in future work. For example, to allow the pretrained network to better adapt to the target task, the network capacity can be increased during the fine-tuning process, where layers corresponding to high-level image understanding can be augmented in depth or width (Wang et al., 2017; Liu et al., 2021). Also, the transfer of useful features can be determined adaptively with attentive feature selection (Wang et al., 2020).

    1

    These authors have contributed equally to this work.

    View full text