Twist-Net: A multi-modality transfer learning network with the hybrid bilateral encoder for hypopharyngeal cancer segmentation
Introduction
Magnetic resonance imaging is widely used in hospitals and clinics for cancer diagnosis. However, manual segmentation of tumors in magnetic resonance images (MRIs) is challenging and time-consuming for doctors with poor experience in primary hospitals. Therefore, it is of great significance to segment the region of interest in MRIs by using deep learning methods. However, the image segmentation methods based on deep learning usually require large datasets for training. Hypopharyngeal cancer (HPC) is a rare disease [1]. As a result, the HPC dataset is not suitable for applying the method for large-scale datasets directly.
Tumors and metastatic lymph nodes in the hypopharynx show the pathological features of HPC. These tumors and metastatic lymph nodes are defined as HPC risk areas. The semantic information of HPC risk areas may occupy larger areas than the tumors and metastatic lymph nodes themselves. As shown in Fig. 1, we need to rely on the information from both the inside (yellow boxes in Fig. 1) and outside (blue boxes in Fig. 1) of the HPC risk areas to comprehensively recognize the shape and region of the HPC risk areas. This requires image segmentation networks to capture multiscale information from multiple receptive field patterns.
MRIs of HPC contain plenty of low-level detailed information and high-level semantic information. The low-level detailed information can reflect the texture and size of the HPC risk areas. The high-level semantic information can reflect the deep features of objects. Therefore, the information types are complementary and essential for semantic segmentation networks. Fusion of the low-level detailed information and high-level semantic information is the key problem.
It is valuable to transfer the prior experience from large-scale datasets to small-scale datasets [2]. The number of modalities in a medical dataset may not as same as that in a large computer vision (CV) dataset. Our HPC dataset consists of T1-weighted and T2-weighted MRIs. A CV dataset, such as ImageNet, has three modalities (channels), named Red, Green and Blue. Therefore, the difference between modalities of a large-scale CV dataset and a small-scale medical dataset is an important problem to be solved.
In summary, the HPC dataset is small-scale dataset with high-level semantic information and low-level detailed information. Therefore, we propose a multi-modality transfer learning network with the hybrid bilateral encoder (Twist-Net), which supports multi-modality inputs and can exploit richer information than single-modality input. Twist-Net introduces the BT Block and BG Block for bidirectional fusing high-level semantics and low-level details. We propose a module with multiple receptive fields extraction capability, named M Module, to capture multi-scale features of the MRIs of HPC. Experiments show that Twist-Net can transfer prior experience from large-scale CV datasets to medical imaging datasets. Compared with other methods (U-Net [3], U-Net ++ [4], U-Net 3+ [5], Attention U-Net [6] and Double U-Net [7]), our method achieves better performance. We also apply our method to two public medical datasets: the CHASE_DB1 dataset [8] and the BraTS2018 dataset [9]. Compared with the above methods, the Dice score of our method is the highest.
The rest of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents the design of our network and the transfer learning method. Section 4 introduces the datasets, data processing strategies, evaluation metrics and implementation details in our work. In Section 5, experimental results are presented and discussed. Finally, conclusions are presented in Section 6.
Section snippets
Convolutional neural networks in medical field
Convolutional neural networks (CNN) play an important role in medical image segmentation [[10], [11], [12], [13], [14]]. Long et al. [15]proposed a fully convolutional network (FCN), which enabled CNN to perform pixel-level image segmentation. Ronneberger et al. [3]proposed U-Net to segment cell images. Zhou et al. [4] proposed U-Net++, in which the skip connections of U-Net were redesigned to explore multiscale features. Huang et al. [5] proposed U-Net 3+ to segment liver tumors in abdominal
Network architecture
The architecture of our network is shown in Fig. 2. Our network contains the Fusion Block, Auxiliary Encoder, Main Encoder, Bilateral Encoder and Decoder.
The Fusion Block is responsible for fusing the input information from different modalities. The Auxiliary Encoder is used to capture high-level semantic information. We use ResNet50 which is pretrained on ImageNet dataset as the Auxiliary Encoder. ResNet50 has five stages: conv_1x, conv_2x, conv_3x, conv_4x and conv_5x. In our paper, we call
Dataset
In this study, we use an in-house dataset (HPC dataset) and two publicly available datasets (CHASE_DB1 dataset and BraTS2018 dataset) to verify our method.
The HPC dataset is a multi-modality dataset. It has T1 and T2-weighted images. This dataset consists of 1727 MRI T1-weighted and 1727 MRI T2-weighted slices (165 cases) from the Cancer Hospital of the Chinese Academy of Medical Sciences. The physical size of pixel is 0.5 mm × 0.5 mm. We collected all patients with hypopharyngeal squamous cell
Results
In this section, we conduct experiments to compare the performance of our proposed method with the contrasting methods. In Table 1, we list the evaluation metrics of these six methods (U-Net, U-Net ++, U-Net 3+, Attention U-Net, Double U-Net and our method). For fair comparison, we unify the encoders of all methods to ResNet50 (pretrained on the ImageNet dataset).
As shown in Table 1, our method achieves the best Dice, reaching 82.98%. U-Net 3+ achieves the second-best Dice, reaching 80.51%. The
Conclusion
In this paper, we propose a Multi-modality Transfer Learning Network with Hybrid Bilateral Encoder (Twist-Net) for Hypopharyngeal Cancer Segmentation. The hybrid network with multi-modality inputs can exploit rich features. The BT Block and BG Block can twist (fuse) high-level semantic feature maps and low-level detailed feature maps. The M Block in our network can capture multi-scale information. The proposed network can transfer prior experience from large computer vision datasets to
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work was supported by National Natural Science Foundation of China under Grant No. 51975011 and No. U1501253, Research Funds for Leading Talents Program No. 048000514120530, Non-profit Central Research Institute Fund of Chinese Academy of Medical Science No.2019-RC-HL-004.
References (48)
- et al.
NPCNet: jointly segment primary nasopharyngeal carcinoma tumors and metastatic lymph nodes in MR images
IEEE Trans. Med. Imag.
(2022) - et al.
Trends in incidence and prognosis for head and neck cancer in the United States: a site‐specific analysis of the SEER database
Int. J. Cancer
(2005) - et al.
Rethinking ImageNet pre-training
- et al.
U-net: convolutional networks for biomedical image segmentation
- et al.
Redesigning skip connections to exploit multiscale features in image segmentation
IEEE Trans. Med. Imag.
(2019) - et al.
Net 3+: a full-scale connected unet for medical image segmentation
- O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, D. Rueckert,.Attention u-net: Learning where to...
- et al.
Doubleu-net: a deep convolutional neural network for medical image segmentation
- et al.
Measuring retinal vessel tortuosity in 10-year-old children: validation of the computer-assisted image analysis of the retina (CAIAR) program
Investig. Ophthalmol. Vis. Sci.
(2009) - et al.
The multimodal brain tumor image segmentation benchmark (BRATS)
IEEE Trans. Med. Imag.
(2014)
Multi-scale segmentation squeeze-and-excitation UNet with conditional random field for segmenting lung tumor from CT images
Comput. Methods Progr. Biomed.
Automatic liver tumor segmentation on dynamic contrast enhanced MRI using 4D information: deep learning model based on 3D convolution and convolutional LSTM
IEEE Trans. Med. Imag.
HFRU-net: high-level feature fusion and recalibration UNet for automatic liver and tumor segmentation in CT images
Comput. Methods Progr. Biomed.
Gross Tumor Volume Segmentation for Stage III NSCLC Radiotherapy Using 3D ResSE-Unet
Fully convolutional networks for semantic segmentation
A multi-modality fusion network based on attention mechanism for brain tumor segmentation
Self-supervised multi-modal hybrid fusion network for brain tumor segmentation
IEEE Journal of Biomedical and Health Informatics
Multilevel threshold image segmentation for COVID-19 chest radiography: a framework using horizontal and vertical multiverse optimization
Comput. Biol. Med.
Directional mutation and crossover boosted ant colony optimization with application to COVID-19 X-ray image segmentation
Comput. Biol. Med.
Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement
Comput. Biol. Med.
How to ensure the confidentiality of electronic medical records on the cloud: a technical perspective
Comput. Biol. Med.
Recognition of imbalanced epileptic EEG signals by a graph-based extreme learning machine
Wireless Commun. Mobile Comput.
Cited by (7)
An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers
2023, Computers in Biology and MedicineReal-time efficient semantic segmentation network based on improved ASPP and parallel fusion module in complex scenes
2023, Journal of Real-Time Image ProcessingThe Complete Transformation of Usefulness of Image Classification in the Field of Medical Clinical Diagnosis
2023, 2023 International Conference on Power Energy, Environment and Intelligent Control, PEEIC 2023