DGMSNet: Spine segmentation for MR image by a detection-guided mixed-supervised segmentation network
Graphical abstract
Introduction
Spine segmentation (i.e., multi-class segmentation of the vertebral bodies (VBs) and intervertebral discs (IVDs) for spine image) for magnetic resonance (MR) image plays a significant role in spine diseases diagnosis, surgical treatment planning, spine pathologies locating (Chang, Zhao, Zheng, Chen, Li, 2020, Pang, Pang, Zhao, Chen, Su, Zhou, Huang, Yang, Lu, Feng, 2021), and spine indices estimation (Pang, Leung, Nachum, Feng, Li, 2018, Pang, Su, Leung, Nachum, Chen, Feng, Li, 2019, Lin, Tao, Pang, Su, Lu, Li, Feng, Chen, 2020, Lin, Tao, Yang, Pang, Su, Lu, Li, Feng, Chen, 2021). Specifically, spine segmentation for the 2D middle sagittal MR image is capable of assisting physicians in grading disc herniation (Fardon, 2001, Williams, Murtagh, Rothman, Sze, 2014). Nevertheless, manual spine segmentation is tedious, time-consuming, and subjected to inter- and intra- observer variabilities caused by expertise. Automated spine segmentation provides a potential to circumvent these issues.
The inter-class similarity, i.e., the shape and appearance similarities appear in the neighboring vertebrae (intervertebral discs) of a subject, is an intractable challenge in spine segmentation for MR images (Pang et al., 2021). To reduce the inter-class similarity of spinal MR images, enhancing the semantic information of images in a segmentation network is a feasible solution. Han et al. (2018) introduced long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) into the segmentation network named Spine-GAN to generate semantic image representation for semantic segmentation of multiple spinal structures by exploiting the long-range spatial correlation of the pixels in the feature maps. Moreover, some researchers (Chang, Zhao, Zheng, Chen, Li, 2020, Pang, Pang, Zhao, Chen, Su, Zhou, Huang, Yang, Lu, Feng, 2021) employed the graph convolutional network (GCN) (Kipf and Welling, 2016) to generate semantic image representation for spine segmentation by capturing the spatial correlations between spinal structures. Ordinarily, annotating the spine image at pixel-level is high-cost, while obtaining the keypoints-detection annotated spine image is low-cost. The aforementioned existing approaches in terms of spine segmentation focus on supervised learning, which limits their generalization if the pixel-level annotated dataset is inadequate.
Given a pixel-level annotated dataset named strongly-supervised dataset (see Fig. 1(a)) and a keypoints-detection annotated dataset named weakly-supervised dataset (see Fig. 1(b)), how to improve the generalization of the spine segmentation model by increasing the extracted semantic information from the weakly-supervised dataset is an attractive issue to be studied in this paper.
Existing related works with regard to this study include mixed-supervised segmentation, multi-task learning, and loss weights learning for multi-task learning.
The mixed-supervised segmentation is a task which achieves segmentation by combining a part of fully-annotated images with the weakly-annotated images. These weak annotations usually emerge in the form of bounding boxes (Wang, Li, Ben-Shlomo, Corrales, Cheng, Zhang, Jayender, 2019, Wang, Li, Ben-Shlomo, Corrales, Cheng, Zhang, Jayender, 2021, Shah, Merchant, Awate, 2018), image-level annotations (Hong, Noh, Han, 2015, Mlynarski, Delingette, Criminisi, Ayache, 2019), pseudo mask (Luo and Yang, 2020), and boundary landmarks (Shah et al., 2018). Previous researches on mixed-supervised segmentation addressed the problem from a multi-task objective perspective. Specifically, Wang, Li, Ben-Shlomo, Corrales, Cheng, Zhang, Jayender, 2019, Wang, Li, Ben-Shlomo, Corrales, Cheng, Zhang, Jayender, 2021 proposed a Mixed-Supervised Dual-Network (MSDN), which consisted of two separate networks for the segmentation and bounding boxes detection tasks respectively, and a series of connection modules between the layers of the two networks. The performance of their segmentation network was improved by the bounding boxes detection network. Luo and Yang (2020) achieved mixed-supervised semantic segmentation via a strong-weak dual-branch network (SWDN), which consisted of a strong branch and a weak branch. They utilized Deep Seeded Region Growing (DSRG) (Huang et al., 2018) to generate the pseudo masks for the image-level annotated images, which were used to train the weak branch. The strong branch was regularized by the weak branch, thus the performance of semantic segmentation was improved.
The abovementioned approaches are subjected to two limitations including: 1) they didn’t employ the keypoints-detection annotated dataset, which is low-cost for annotation and has a potential to assist the spine segmentation task since it contains sufficient semantic information for spine segmentation; 2) the auxiliary task trained by the weakly-supervised dataset only explicitly affects the main task (i.e., segmentation) in the feature space, but not in the prediction space. Differently, our work focus on exploiting the keypoints-detection annotated dataset to facilitate spine segmentation and the auxiliary task (i.e., keypoints detection) in our approach explicitly guides the main task in both the feature space and prediction space. Moreover, no study in terms of the mixed-supervised segmentation network exists in spine segmentation for MR images.
The multi-task learning network is aimed at improving network performance by learning multiple tasks simultaneously. Zhang et al. (2020) presented a multi-task relational learning network (MRLN) for vertebrae detection (i.e., localization and identification) and segmentation. They introduced a co-attention module in the forward propagation procedure to learn the correlation information between two tasks, which alleviated the overfitting of a single task. Nie et al. (2018) proposed a novel parsing induced learner (PIL) to exploit human parsing information to effectively assist keypoints detection by adaptive convolution, whose dynamic parameters were generated by PIL. Inspired by PIL but different from it, the proposed approach exploits the keypoints detection task to assist the parsing (i.e., segmentation) task.
The loss function of multi-task learning is usually a weighted combination of multiple losses for the tasks. How to set the loss weights of different tasks is a challenge. Handcrafted task weighting (e.g., grid search) is a simple approach for setting the loss weights. But the calculation cost is exponentially increasing with the number of tasks. Thus increasing researches emerge in automated loss weights learning for multi-task learning. Existing approaches are divided into three categories including gradient-based weights learning (Chen, Badrinarayanan, Lee, Rabinovich, 2018, Jha, Kumar, Banerjee, Chaudhuri, 2020), uncertainty-based weights learning (Kendall et al., 2018), and loss value-based weights learning (Liu et al., 2019). Among these approaches, a fixed trained model is used for testing. In other words, all test images share the same loss weights learning procedure, which limits the generalization of the model. Moreover, these loss weights learning approaches would introduce additional hyper-parameters, which is not cost-effective if there is only one loss weight to be learned.
In this study, a detection-guided mixed-supervised segmentation network (DGMSNet) as shown in Fig. 2 is proposed to achieve spine segmentation for MR image. The DGMSNet is comprised of a segmentation path for generating the spine segmentation prediction and a detection path (i.e., regression network) for producing the heatmaps prediction of the keypoints. The detection-guided learner (DGL) in the detection path generates the dynamic parameter, which is used as the convolution kernel of the adaptive convolution to extract semantic information for the segmentation path. The two paths are simultaneously trained end to end. Specifically, the segmentation path is trained by the strongly-supervised dataset, while the detection path is trained by both the strongly-supervised dataset and weakly-supervised dataset. Note that the keypoints coordinates in the strongly-supervised dataset are generated by the mask of spinal structures. The loss function is a weighted combination of the losses of the segmentation task and detection task. During the training phase, a set of models are trained and saved with various values of loss weights. During the inference phase, these trained models are used to generate a set of segmentation predictions and heatmaps predictions by the segmentation path and detection path respectively. Based on the assumption that the detection path outperforms the segmentation path, the final segmentation prediction is obtained by the detection-guided label fusion according to the consistency of the predicted heatmaps from the detection path and those generated by the segmentation prediction.
The main contributions of this paper are listed as follows:
- •
We present a detection-guided mixed-supervised segmentation network (DGMSNet) to achieve spine segmentation for MR image. The generalization of the segmentation path in DGMSNet is improved under the guidance of the detection path in feature space with the weakly-supervised dataset.
- •
We introduce a detection-guided learner (DGL) to produce a semantic feature for spine segmentation, which alleviates the inter-class similarity and improves the performance of spine segmentation.
- •
We propose a detection-guided label fusion (DGLF) approach to obtain the final segmentation prediction for the inference phase by deciding whether to use majority voting or adaptive model selection according to the sensitivity of segmentation performance to the loss weight. The segmentation path is guided by detection path in prediction space, which improves the generalization and robustness of the method.
Section snippets
DGMSNet
The proposed DGMSNet as shown in Fig. 2 consists of a detection path parameterized by and a segmentation path parameterized by and . Note that we omit the parameters of the networks hereinafter and replace and with and respectively to simplify notation. The detection path aims at extracting semantic feature, which guides the segmentation path to generate accurate segmentation result. Both two paths utilize an encoder-decoder architecture. Moreover, the detection
Datasets
Two datasets denoted as Dataset-A and Dataset-B respectively were used to evaluate the segmentation performance of the proposed approach.
Overall performance
As shown in Fig. 5 and Fig. 6, the proposed DGMSNet achieves accurate spine segmentation for MR images. Specifically, DGMSNet achieves mean Precisions of , , and for VBs, IVDs, and all 10 spinal structures segmentation respectively. The corresponding mean Recalls are , , and respectively. The Precision of IVDs segmentation is significantly lower than the corresponding Recall, which demonstrates that the false-positive rate of
Conclusion
We have presented an accurate and robust detection-guided mixed-supervised segmentation network (DGMSNet) to achieve spine segmentation for MR images. In the training phase, the proposed DGL learned the semantic information of spinal structures from the weakly-supervised dataset by the mixed-supervised learning strategy, which guided the segmentation path in feature space to generate accurate segmentation prediction. In the inference phase, based on Assumption 1, the DGLF was presented to
CRediT authorship contribution statement
Shumao Pang: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing. Chunlan Pang: Investigation, Software, Writing – original draft. Zhihai Su: Resources, Data curation. Liyan Lin: Visualization. Lei Zhao: Visualization. Yangfan Chen: Visualization. Yujia Zhou: Writing – review & editing. Hai Lu: Resources, Data curation. Qianjin Feng: Supervision, Project administration, Funding acquisition, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
Financial support for this work was provided by the National Natural Science Foundation of China (No. 62001207, 81974275), the China Postdoctoral Science Foundation (No. 2020M672712), the Zhuhai City Innovation and Innovation Team Project, Guangdong Province, China (No. ZH0406190031PWC), and the Guangdong Provincial Key Laboratory of Medical Image Processing (No. 2020B1212060039). No other potential conflict of interest relevant to this article was reported.
References (34)
- et al.
Spine-gan: semantic segmentation of multiple spinal structures
Med Image Anal
(2018) - et al.
Automatic anatomical brain MRI segmentation combining label propagation and decision fusion
Neuroimage
(2006) - et al.
Direct automated quantitative measurement of spine by cascade amplifier regression network with manifold regularization
Med Image Anal
(2019) - et al.
A novel dual-network architecture for mixed-supervised medical image segmentation
Computerized Medical Imaging and Graphics
(2021) - et al.
Mrln: multi-task relational learning network for mri vertebral localization, identification, and segmentation
IEEE J Biomed Health Inform
(2020) - et al.
Multi-vertebrae segmentation from arbitrary spine mr images under global view
International Conference on Medical Image Computing and Computer-Assisted Intervention
(2020) - et al.
Encoder-decoder with atrous separable convolution for semantic image segmentation
Proceedings of the European conference on computer vision (ECCV)
(2018) - et al.
Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks
International Conference on Machine Learning
(2018) Nomenclature and classification of lumbar disc pathology
Spine
(2001)- et al.
Dynamic multi-scale filters for semantic segmentation
Proceedings of the IEEE/CVF International Conference on Computer Vision
(2019)
Long short-term memory
Neural Comput
Decoupled deep neural network for semi-supervised semantic segmentation
arXiv preprint arXiv:1506.04924
Weakly-supervised semantic segmentation network with deep seeded region growing
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Batch normalization: Accelerating deep network training by reducing internal covariate shift
International conference on machine learning
Adamt-net: An adaptive weight learning based multi-task learning model for scene understanding
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Multi-task learning using uncertainty to weigh losses for scene geometry and semantics
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Adam: a method for stochastic optimization
arXiv preprint arXiv:1412.6980
Cited by (24)
SSCK-Net: Spine segmentation in MRI based on cross attention and key-points recognition-assisted learner
2023, Biomedical Signal Processing and ControlAutomatic aid diagnosis report generation for lumbar disc MR image based on lightweight artificial neural networks
2023, Biomedical Signal Processing and ControlRsALUNet: A reinforcement supervision U-Net-based framework for multi-ROI segmentation of medical images
2023, Biomedical Signal Processing and ControlContext-aware and local-aware fusion with transformer for medical image segmentation
2024, Physics in Medicine and Biology