‘Squeeze & excite’ guided few-shot segmentation of volumetric images

doi:10.1016/j.media.2019.101587

Medical Image Analysis

Volume 59, January 2020, 101587

https://doi.org/10.1016/j.media.2019.101587 Get rights and content

Highlights

•
We present the first few-shot segmentation framework for volumetric medical scans.
•
We introduce strong interactions at multiple locations between the conditioner and segmenter arms, instead of only one interaction at the final layer.
•
‘Channel squeeze & spatial excitation’ modules for effectuating the interaction.
•
Stable training of few-shot segmenter from scratch without requiring a pre-trained model.
•
A volumetric segmentation strategy that optimally pairs the slices of query and support volumes

Abstract

Deep neural networks enable highly accurate image segmentation, but require large amounts of manually annotated data for supervised training. Few-shot learning aims to address this shortcoming by learning a new class from a few annotated support examples. We introduce, a novel few-shot framework, for the segmentation of volumetric medical images with only a few annotated slices. Compared to other related works in computer vision, the major challenges are the absence of pre-trained networks and the volumetric nature of medical scans. We address these challenges by proposing a new architecture for few-shot segmentation that incorporates ‘squeeze & excite’ blocks. Our two-armed architecture consists of a conditioner arm, which processes the annotated support input and generates a task-specific representation. This representation is passed on to the segmenter arm that uses this information to segment the new query image. To facilitate efficient interaction between the conditioner and the segmenter arm, we propose to use ‘channel squeeze & spatial excitation’ blocks – a light-weight computational module – that enables heavy interaction between both the arms with negligible increase in model complexity. This contribution allows us to perform image segmentation without relying on a pre-trained model, which generally is unavailable for medical scans. Furthermore, we propose an efficient strategy for volumetric segmentation by optimally pairing a few slices of the support volume to all the slices of the query volume. We perform experiments for organ segmentation on whole-body contrast-enhanced CT scans from the Visceral Dataset. Our proposed model outperforms multiple baselines and existing approaches with respect to the segmentation accuracy by a significant margin. The source code is available at https://github.com/abhi4ssj/few-shot-segmentation.

Graphical abstract

Introduction

Fully convolutional neural networks (F-CNNs) have achieved state-of-the-art performance in semantic image segmentation for both natural (Jégou, Drozdzal, Vazquez, Romero, Bengio, 2017, Zhao, Shi, Qi, Wang, Jia, 2017, Long, Shelhamer, Darrell, 2015, Noh, Hong, Han, 2015) and medical images (Ronneberger, Fischer, Brox, 2015, Milletari, Navab, Ahmadi, 2016). Despite their tremendous success in image segmentation, they are of limited use when only a few labeled images are available. F-CNNs are in general highly complex models with millions of trainable weight parameters that require thousands of densely annotated images for training to be effective. A better strategy could be to adapt an already trained F-CNN model to segment a new semantic class from a few labeled images. This strategy often works well in computer vision applications where a pre-trained model is used to provide a good initialization and is subsequently fine-tuned with the new data to tailor it to the new semantic class. However, fine-tuning an existing pre-trained network without risking over-fitting still requires a fair amount of annotated images (at least in the order of hundreds). When dealing in an extremely low data regime, where only a single or a few annotated images of the new class are available, such fine-tuning based transfer learning often fails and may cause overfitting (Shaban, Bansal, Liu, Essa, Boots, Rakelly, Shelhamer, Darrell, Efros, Levine).

Few-shot learning is a machine learning technique that aims to address situations where an existing model needs to generalize to an unknown semantic class with a few examples at a rapid pace (Fei-Fei, Fergus, Perona, 2006, Miller, Matsakis, Viola, 2000, Fei-Fei, 2006). The basic concept of few-shot learning is motivated by the learning process of humans, where learning new semantics is done rapidly with very few observations, leveraging strong prior knowledge acquired from past experience. While few-shot learning for image classification and object detection is a well studied topic, few-shot learning for semantic image segmentation with neural networks has only recently been proposed (Shaban, Bansal, Liu, Essa, Boots, Rakelly, Shelhamer, Darrell, Efros, Levine). It is an immensely challenging task to make dense pixel-level high-dimensional predictions in such an extremely low data regime. But at the same time, few-shot learning could have a big impact on medical image analysis because it addresses learning from scarcely annotated data, which is the norm due to the dependence on medical experts for carrying out manual labeling. In this article, we propose a few-shot segmentation framework designed exclusively for segmenting volumetric medical scans. A key to achieve this goal is to integrate the recently proposed ‘squeeze & excite’ blocks within the design of our novel few-shot architecture (Roy et al., 2018b).

Few-shot learning algorithms try to generalize a model to a new, previously unseen class with only a few labeled examples by utilizing the previously acquired knowledge from differently labeled training data. Fig. 1 illustrates the overall setup, where we want to segment the liver in a new scan given the annotation of liver in only a single slice. A few-shot segmentation network architecture (Shaban, Bansal, Liu, Essa, Boots, Rakelly, Shelhamer, Darrell, Efros, Levine) commonly consists of three parts: (i) a conditioner arm, (ii) a set of interaction blocks, and (iii) a segmentation arm. During inference, the model is provided with a support set (I_s, L_s(α)), consisting of an image I_s with the new semantic class (or organ) α outlined as a binary mask indicated as L_s(α). In addition, a query image I_q is provided, where the new semantic class is to be segmented. The conditioner takes in the support set and performs a forward pass. This generates multiple feature maps of the support set in all the intermediate layers of the conditioner arm. This set of feature maps is referred to as task representation as they encode the information required to segment the new semantic class. The task representation is taken up by the interaction blocks, whose role is to pass the relevant information to the segmentation arm. The segmentation arm takes the query image as input, leverages the task information as provided by the interaction blocks and generates a segmentation mask M_q for the query input I_q. Thus, interaction blocks pass the information from the conditioner to the segmenter and form the backbone for few-shot semantic image segmentation. Existing approaches use weak interactions with a single connection either at the bottleneck or the last layer of the network (Shaban, Bansal, Liu, Essa, Boots, Rakelly, Shelhamer, Darrell, Efros, Levine).

Existing work in computer vision on few-shot segmentation processes 2D RGB images and uses a pre-trained model for both segmenter and conditioner arm to aid training (Shaban, Bansal, Liu, Essa, Boots, Rakelly, Shelhamer, Darrell, Efros, Levine). Pre-trained models provide a strong prior knowledge with more powerful features from the start of training. Hence, weak interaction between conditioner and segmenter is sufficient to train the model effectively. The direct extension to medical images is challenging due to the lack of pre-trained models. Instead, both the conditioner and the segmenter need to be trained from scratch. However, training the network in the absence of pre-trained models with weak interaction is prone to instability and mode collapse.

Instead of weak interaction, we propose a strong interaction at multiple locations between both the arms. The strong interaction facilitates effective gradient flow across the 2 arms, which eases the training of both the arms without the need for any pre-trained model. For effectuating the interaction, we propose our recently introduced ‘channel squeeze & spatial excitation’ (sSE) module (Roy, Navab, Wachinger, Roy, Navab, Wachinger, 2018b). In our previous works, we used the sSE blocks for adaptive self re-calibration of feature maps to aid segmentation in a single segmentation network. Here, we use the sSE blocks to communicate between the 2 arms of the few-shot segmentation network. The block takes as input the learned conditioner feature map and performs ‘channel squeeze’ to learn a spatial map. This is used to perform ‘spatial excitation’ on the segmenter feature map. We use sSE blocks between all the encoder, bottleneck and decoder blocks. SE blocks are well suited for effectuating the interaction between arms, as they are light-weight and therefore only marginally increase the model complexity. Despite its light-weight nature, they can have a strong impact on the segmenter’s features via re-calibration.

Existing work on few-shot segmentation focused on 2D images, while we are dealing with volumetric medical scans. Manually annotating organs on all slices in 3D images is time consuming. Following the idea of few-shot learning, the annotation should rather happen on a few sparsely selected slices. To this end, we propose a volumetric segmentation strategy by properly pairing a few annotated slices of the support volume with all the slices of the query volume, maintaining inter-slice consistency of the segmentation.

In this work, we propose:

1.
A novel few-shot segmentation framework for volumetric medical scans.
2.
Strong interactions at multiple locations between the conditioner and segmenter arms, instead of only one interaction at the final layer.
3.
‘Squeeze & excitation’ modules for effectuating the interaction.
4.
Stable training from scratch without requiring a pre-trained model.
5.
A volumetric segmentation strategy that optimally pairs the slices of query and support volumes.

We discuss related work in Section 2, present our few-shot segmentation algorithm in Section 3, the experimental setup in Section 4 and experimental results and discussion in Section 5. We conclude with a summary of our contributions in Section 6.

Section snippets

Few-shot learning

Methods for few-shot learning can be broadly divided into three groups. The first group of methods adapts a base classifier to the new class (Bart, Ullman, 2005, Fei-Fei, Fergus, Perona, 2006, Hariharan, Girshick, 2017). These approaches are often prone to overfitting as they attempt to fit a complex model on a few new samples. Methods in the second group aim to predict classifiers close to the base classifier to prevent overfitting. The basic idea is to use a two-branch network, where the

Method

In this section, we first introduce the problem setup, then detail the architecture of our network and the training strategy, and finally describe the evaluation strategy for segmenting volumetric scans.

Dataset description

We choose the challenging task of organ segmentation from contrast-enhanced CT (ceCT) scans, for evaluating our few-shot volumetric segmentation framework. We use the Visceral dataset (Jimenez-del Toro et al., 2016), which consists of two parts (i) silver corpus (with 65 scans) and (ii) gold corpus (20 scans). All the scans were resampled to a voxel resolution of 2 mm³.

Problem formulation

As there is no existing benchmark for few-shot image segmentation on volumetric medical images, we formulate our own

‘Squeeze & excitation’ based interaction

In this section, we investigate the optimal positions of the SE blocks for facilitating interaction and compare the performance of cSE and sSE blocks. Here, we set the number of convolution kernels of the conditioner arm to 16 and the segmenter arm to 64. We use $k = 12$ support slices from the support volume. Since the aim of this experiment is to evaluate the position and the type of SE blocks, we keep the above parameters fixed, but evaluate them later. With four different possibilities of

Conclusion

In this article, we introduced a few-shot segmentation framework for volumetric medical scans. The main challenges were the absence of pre-trained models to start from, and the volumetric nature of the scans. We proposed to use ‘channel squeeze and spatial excitation’ blocks for aiding proper training of our framework from scratch. In addition, we proposed a volumetric segmentation strategy for segmenting a query volume scan with a support volume scan by strategic by pairing 2D slices

Declaration of Competing Interest

The authors declare that they do not have any financial or nonfinancial conflict of interests.

Acknowledgement

We thank SAP SE and the Bavarian State Ministry of Education, Science and the Arts in the framework of the Centre Digitisation. Bavaria (ZD.B) for funding and the NVIDIA corporation for GPU donation.

References (26)

E. Bart et al.
Cross-generalization: learning novel classes from a single example by feature replacement
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
(2005)
L. Bertinetto et al.
Learning feed-forward one-shot learners
Advances in Neural Information Processing Systems
(2016)
S. Caelles et al.
One-shot video object segmentation
CVPR
(2017)
N. Dong et al.
Few-shot semantic segmentation with prototype learning
BMVC
(2018)
L. Fei-Fei
Knowledge transfer in learning to recognize visual objects classes
Proceedings of the International Conference on Development and Learning (ICDL)
(2006)
L. Fei-Fei et al.
One-shot learning of object categories
IEEE Trans. Pattern Anal. Mach. Intell.
(2006)
B. Hariharan et al.
Low-shot visual recognition by shrinking and hallucinating features
Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy
(2017)
K. He et al.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
Proceedings of the IEEE International Conference on Computer Vision
(2015)
J. Hu et al.
Squeeze-and-excitation networks
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2018)
S. Jégou et al.
The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation
Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on
(2017)

O. Jimenez-del Toro et al.

Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: visceral anatomy benchmarks

IEEE Trans. Med. Imag.

(2016)

G. Koch et al.

Siamese neural networks for one-shot image recognition

ICML Deep Learning Workshop

(2015)

J. Long et al.

Fully convolutional networks for semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

(2015)

Cited by (143)

Anomaly-guided weakly supervised lesion segmentation on retinal OCT images
2024, Medical Image Analysis
The availability of big data can transform the studies in biomedical research to generate greater scientific insights if expert labeling is available to facilitate supervised learning. However, data annotation can be labor-intensive and cost-prohibitive if pixel-level precision is required. Weakly supervised semantic segmentation (WSSS) with image-level labeling has emerged as a promising solution in medical imaging. However, most existing WSSS methods in the medical domain are designed for single-class segmentation per image, overlooking the complexities arising from the co-existence of multiple classes in a single image. Additionally, the multi-class WSSS methods from the natural image domain cannot produce comparable accuracy for medical images, given the challenge of substantial variation in lesion scales and occurrences. To address this issue, we propose a novel anomaly-guided mechanism (AGM) for multi-class segmentation in a single image on retinal optical coherence tomography (OCT) using only image-level labels. AGM leverages the anomaly detection and self-attention approach to integrate weak abnormal signals with global contextual information into the training process. Furthermore, we include an iterative refinement stage to guide the model to focus more on the potential lesions while suppressing less relevant regions. We validate the performance of our model with two public datasets and one challenging private dataset. Experimental results show that our approach achieves a new state-of-the-art performance in WSSS for lesion segmentation on OCT images.
One-shot segmentation of novel white matter tracts via extensive data augmentation and adaptive knowledge transfer
2023, Medical Image Analysis
The use of convolutional neural networks (CNNs) has allowed accurate white matter (WM) tract segmentation on diffusion magnetic resonance imaging (dMRI). To train the CNN-based segmentation models, a large number of scans on which WM tracts are annotated need to be collected, and these annotated scans can be accumulated over a long period of time. However, when novel WM tracts that are different from existing annotated WM tracts are of interest, additional annotations are required for their segmentation. Due to the cost of manual annotations, methods have been developed for few-shot segmentation of novel WM tracts, where the segmentation knowledge is transferred from existing WM tracts to novel WM tracts and the amount of annotated data for novel WM tracts is reduced. Despite these developments, it is desirable to further reduce the amount of annotated data to the one-shot setting with a single annotated image. To address this problem, we develop an approach to one-shot segmentation of novel WM tracts. Our method follows the existing pretraining/fine-tuning framework that transfers segmentation knowledge from existing to novel WM tracts. First, as there is extremely scarce annotated data in the one-shot setting, we design several different data augmentation strategies so that extensive data augmentation can be performed to obtain extra synthetic training data. The data augmentation strategies are based on image masking and thus applicable to the one-shot setting. Second, to address overfitting and knowledge forgetting in the fine-tuning stage that can be more severe given limited training data, we propose an adaptive knowledge transfer strategy that selects the network weights to be updated. The data augmentation and adaptive knowledge transfer strategies are combined to train the segmentation model. Considering that the different data augmentation strategies can generate synthetic data that contain potentially conflicting information, we apply the data augmentation strategies separately, each leading to a different segmentation model. The results predicted by the different models are fused to produce the final segmentation. We validated our method on two brain dMRI datasets, including a public dataset and an in-house dataset. Different settings were considered for the validation, and the results show that the proposed method improves the one-shot segmentation of novel WM tracts.
Prototypical few-shot segmentation for cross-institution male pelvic structures with spatial registration
2023, Medical Image Analysis
The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of the support image data, which are labelled to classify or segment new classes, a task that otherwise requires substantially more training images and expert annotations. This work describes a fully 3D prototypical few-shot segmentation algorithm, such that the trained networks can be effectively adapted to clinically interesting structures that are absent in training, using only a few labelled images from a different institute. First, to compensate for the widely recognised spatial variability between institutions in episodic adaptation of novel classes, a novel spatial registration mechanism is integrated into prototypical learning, consisting of a segmentation head and an spatial alignment module. Second, to assist the training with observed imperfect alignment, support mask conditioning module is proposed to further utilise the annotation available from the support images. Extensive experiments are presented in an application of segmenting eight anatomical structures important for interventional planning, using a data set of 589 pelvic T2-weighted MR images, acquired at seven institutes. The results demonstrate the efficacy in each of the 3D formulation, the spatial registration, and the support mask conditioning, all of which made positive contributions independently or collectively. Compared with the previously proposed 2D alternatives, the few-shot segmentation performance was improved with statistical significance, regardless whether the support data come from the same or different institutes.
ADNet++: A few-shot learning framework for multi-class medical image volume segmentation with uncertainty-guided feature refinement
2023, Medical Image Analysis
A major barrier to applying deep segmentation models in the medical domain is their typical data-hungry nature, requiring experts to collect and label large amounts of data for training. As a reaction, prototypical few-shot segmentation (FSS) models have recently gained traction as data-efficient alternatives. Nevertheless, despite the recent progress of these models, they still have some essential shortcomings that must be addressed. In this work, we focus on three of these shortcomings: (i) the lack of uncertainty estimation, (ii) the lack of a guiding mechanism to help locate edges and encourage spatial consistency in the segmentation maps, and (iii) the models’ inability to do one-step multi-class segmentation. Without modifying or requiring a specific backbone architecture, we propose a modified prototype extraction module that facilitates the computation of uncertainty maps in prototypical FSS models, and show that the resulting maps are useful indicators of the model uncertainty. To improve the segmentation around boundaries and to encourage spatial consistency, we propose a novel feature refinement module that leverages structural information in the input space to help guide the segmentation in the feature space. Furthermore, we demonstrate how uncertainty maps can be used to automatically guide this feature refinement. Finally, to avoid ambiguous voxel predictions that occur when images are segmented class-by-class, we propose a procedure to perform one-step multi-class FSS. The efficiency of our proposed methodology is evaluated on two representative datasets for abdominal organ segmentation (CHAOS dataset and BTCV dataset) and one dataset for cardiac segmentation (MS-CMRSeg dataset). The results show that our proposed methodology significantly (one-sided Wilcoxon signed rank test, $p < 0.05$ ) improves the baseline, increasing the overall dice score with +5.2, +5.1, and +2.8 percentage points for the CHAOS dataset, the BTCV dataset, and the MS-CMRSeg dataset, respectively.
Learning what and where to segment: A new perspective on medical image few-shot segmentation
2023, Medical Image Analysis
Traditional medical image segmentation methods based on deep learning require experts to provide extensive manual delineations for model training. Few-shot learning aims to reduce the dependence on the scale of training data but usually shows poor generalizability to the new target. The trained model tends to favor the training classes rather than being absolutely class-agnostic. In this work, we propose a novel two-branch segmentation network based on unique medical prior knowledge to alleviate the above problem. Specifically, we explicitly introduce a spatial branch to provide the spatial information of the target. In addition, we build a segmentation branch based on the classical encoder–decoder structure in supervised learning and integrate prototype similarity and spatial information as prior knowledge. To achieve effective information integration, we propose an attention-based fusion module (AF) that enables the content interaction of decoder features and prior knowledge. Experiments on an echocardiography dataset and an abdominal MRI dataset show that the proposed model achieves substantial improvements over state-of-the-art methods. Moreover, some results are comparable to those of the fully supervised model. The source code is available at github.com/warmestwind/RAPNet.
A survey on machine learning from few samples
2023, Pattern Recognition
The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning, few surveys for few sample learning (FSL) are available. We extensively study almost all papers of FSL spanning from the 2000s to now and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history and current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review their latest advances. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.

View all citing articles on Scopus

¹: A. Guha Roy, S. Siddiqui and S. Pölsterl has contributed equally to this work.

View full text

‘Squeeze & excite’ guided few-shot segmentation of volumetric images

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Few-shot learning

Method

Dataset description

Problem formulation

‘Squeeze & excitation’ based interaction

Conclusion

Declaration of Competing Interest

Acknowledgement

Cross-generalization: learning novel classes from a single example by feature replacement

Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on

Learning feed-forward one-shot learners

Advances in Neural Information Processing Systems

One-shot video object segmentation

CVPR

Few-shot semantic segmentation with prototype learning

BMVC

Knowledge transfer in learning to recognize visual objects classes

Proceedings of the International Conference on Development and Learning (ICDL)

One-shot learning of object categories

IEEE Trans. Pattern Anal. Mach. Intell.

Low-shot visual recognition by shrinking and hallucinating features

Proc. of IEEE Int. Conf. on Computer Vision (ICCV), Venice, Italy

Delving deep into rectifiers: surpassing human-level performance on imagenet classification

Proceedings of the IEEE International Conference on Computer Vision

Squeeze-and-excitation networks

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation

Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on

Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: visceral anatomy benchmarks

IEEE Trans. Med. Imag.

Siamese neural networks for one-shot image recognition

ICML Deep Learning Workshop

Fully convolutional networks for semantic segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition