Twist-Net: A multi-modality transfer learning network with the hybrid bilateral encoder for hypopharyngeal cancer segmentation

https://doi.org/10.1016/j.compbiomed.2023.106555Get rights and content

Highlights

  • A hybrid network with multi-modality inputs.

  • Fusing high-level semantic and low-level detailed feature maps.

  • Deformable convolution is used to flexibly adjust the receptive field.

  • Transferring priors experience from large computer vision datasets to multi-modality medical image datasets.

Abstract

Hypopharyngeal cancer (HPC) is a rare disease. Therefore, it is a challenge to automatically segment HPC tumors and metastatic lymph nodes (HPC risk areas) from medical images with the small-scale dataset. Combining low-level details and high-level semantics from feature maps in different scales can improve the accuracy of segmentation. Herein, we propose a Multi-Modality Transfer Learning Network with Hybrid Bilateral Encoder (Twist-Net) for Hypopharyngeal Cancer Segmentation. Specifically, we propose a Bilateral Transition (BT) block and a Bilateral Gather (BG) block to twist (fuse) high-level semantic feature maps and low-level detailed feature maps. We design a block with multi-receptive field extraction capabilities, M Block, to capture multi-scale information. To avoid overfitting caused by the small scale of the dataset, we propose a transfer learning method that can transfer priors experience from large computer vision datasets to multi-modality medical imaging datasets. Compared with other methods, our method outperforms other methods on HPC dataset, achieving the highest Dice of 82.98%. Our method is also superior to other methods on two public medical segmentation datasets, i.e., the CHASE_DB1 dataset and BraTS2018 dataset. On these two datasets, the Dice of our method is 79.83% and 84.87%, respectively. The code is available at: https://github.com/zhongqiu1245/TwistNet.

Introduction

Magnetic resonance imaging is widely used in hospitals and clinics for cancer diagnosis. However, manual segmentation of tumors in magnetic resonance images (MRIs) is challenging and time-consuming for doctors with poor experience in primary hospitals. Therefore, it is of great significance to segment the region of interest in MRIs by using deep learning methods. However, the image segmentation methods based on deep learning usually require large datasets for training. Hypopharyngeal cancer (HPC) is a rare disease [1]. As a result, the HPC dataset is not suitable for applying the method for large-scale datasets directly.

Tumors and metastatic lymph nodes in the hypopharynx show the pathological features of HPC. These tumors and metastatic lymph nodes are defined as HPC risk areas. The semantic information of HPC risk areas may occupy larger areas than the tumors and metastatic lymph nodes themselves. As shown in Fig. 1, we need to rely on the information from both the inside (yellow boxes in Fig. 1) and outside (blue boxes in Fig. 1) of the HPC risk areas to comprehensively recognize the shape and region of the HPC risk areas. This requires image segmentation networks to capture multiscale information from multiple receptive field patterns.

MRIs of HPC contain plenty of low-level detailed information and high-level semantic information. The low-level detailed information can reflect the texture and size of the HPC risk areas. The high-level semantic information can reflect the deep features of objects. Therefore, the information types are complementary and essential for semantic segmentation networks. Fusion of the low-level detailed information and high-level semantic information is the key problem.

It is valuable to transfer the prior experience from large-scale datasets to small-scale datasets [2]. The number of modalities in a medical dataset may not as same as that in a large computer vision (CV) dataset. Our HPC dataset consists of T1-weighted and T2-weighted MRIs. A CV dataset, such as ImageNet, has three modalities (channels), named Red, Green and Blue. Therefore, the difference between modalities of a large-scale CV dataset and a small-scale medical dataset is an important problem to be solved.

In summary, the HPC dataset is small-scale dataset with high-level semantic information and low-level detailed information. Therefore, we propose a multi-modality transfer learning network with the hybrid bilateral encoder (Twist-Net), which supports multi-modality inputs and can exploit richer information than single-modality input. Twist-Net introduces the BT Block and BG Block for bidirectional fusing high-level semantics and low-level details. We propose a module with multiple receptive fields extraction capability, named M Module, to capture multi-scale features of the MRIs of HPC. Experiments show that Twist-Net can transfer prior experience from large-scale CV datasets to medical imaging datasets. Compared with other methods (U-Net [3], U-Net ++ [4], U-Net 3+ [5], Attention U-Net [6] and Double U-Net [7]), our method achieves better performance. We also apply our method to two public medical datasets: the CHASE_DB1 dataset [8] and the BraTS2018 dataset [9]. Compared with the above methods, the Dice score of our method is the highest.

The rest of this paper is organized as follows. Section 2 introduces the related works. Section 3 presents the design of our network and the transfer learning method. Section 4 introduces the datasets, data processing strategies, evaluation metrics and implementation details in our work. In Section 5, experimental results are presented and discussed. Finally, conclusions are presented in Section 6.

Section snippets

Convolutional neural networks in medical field

Convolutional neural networks (CNN) play an important role in medical image segmentation [[10], [11], [12], [13], [14]]. Long et al. [15]proposed a fully convolutional network (FCN), which enabled CNN to perform pixel-level image segmentation. Ronneberger et al. [3]proposed U-Net to segment cell images. Zhou et al. [4] proposed U-Net++, in which the skip connections of U-Net were redesigned to explore multiscale features. Huang et al. [5] proposed U-Net 3+ to segment liver tumors in abdominal

Network architecture

The architecture of our network is shown in Fig. 2. Our network contains the Fusion Block, Auxiliary Encoder, Main Encoder, Bilateral Encoder and Decoder.

The Fusion Block is responsible for fusing the input information from different modalities. The Auxiliary Encoder is used to capture high-level semantic information. We use ResNet50 which is pretrained on ImageNet dataset as the Auxiliary Encoder. ResNet50 has five stages: conv_1x, conv_2x, conv_3x, conv_4x and conv_5x. In our paper, we call

Dataset

In this study, we use an in-house dataset (HPC dataset) and two publicly available datasets (CHASE_DB1 dataset and BraTS2018 dataset) to verify our method.

The HPC dataset is a multi-modality dataset. It has T1 and T2-weighted images. This dataset consists of 1727 MRI T1-weighted and 1727 MRI T2-weighted slices (165 cases) from the Cancer Hospital of the Chinese Academy of Medical Sciences. The physical size of pixel is 0.5 mm × 0.5 mm. We collected all patients with hypopharyngeal squamous cell

Results

In this section, we conduct experiments to compare the performance of our proposed method with the contrasting methods. In Table 1, we list the evaluation metrics of these six methods (U-Net, U-Net ++, U-Net 3+, Attention U-Net, Double U-Net and our method). For fair comparison, we unify the encoders of all methods to ResNet50 (pretrained on the ImageNet dataset).

As shown in Table 1, our method achieves the best Dice, reaching 82.98%. U-Net 3+ achieves the second-best Dice, reaching 80.51%. The

Conclusion

In this paper, we propose a Multi-modality Transfer Learning Network with Hybrid Bilateral Encoder (Twist-Net) for Hypopharyngeal Cancer Segmentation. The hybrid network with multi-modality inputs can exploit rich features. The BT Block and BG Block can twist (fuse) high-level semantic feature maps and low-level detailed feature maps. The M Block in our network can capture multi-scale information. The proposed network can transfer prior experience from large computer vision datasets to

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by National Natural Science Foundation of China under Grant No. 51975011 and No. U1501253, Research Funds for Leading Talents Program No. 048000514120530, Non-profit Central Research Institute Fund of Chinese Academy of Medical Science No.2019-RC-HL-004.

References (48)

  • Y. Li et al.

    NPCNet: jointly segment primary nasopharyngeal carcinoma tumors and metastatic lymph nodes in MR images

    IEEE Trans. Med. Imag.

    (2022)
  • A.L. Carvalho et al.

    Trends in incidence and prognosis for head and neck cancer in the United States: a site‐specific analysis of the SEER database

    Int. J. Cancer

    (2005)
  • K. He et al.

    Rethinking ImageNet pre-training

  • O. Ronneberger et al.

    U-net: convolutional networks for biomedical image segmentation

  • Z. Zhou et al.

    Redesigning skip connections to exploit multiscale features in image segmentation

    IEEE Trans. Med. Imag.

    (2019)
  • H. Huang et al.

    Net 3+: a full-scale connected unet for medical image segmentation

  • O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, D. Rueckert,.Attention u-net: Learning where to...
  • D. Jha et al.

    Doubleu-net: a deep convolutional neural network for medical image segmentation

  • C.G. Owen et al.

    Measuring retinal vessel tortuosity in 10-year-old children: validation of the computer-assisted image analysis of the retina (CAIAR) program

    Investig. Ophthalmol. Vis. Sci.

    (2009)
  • B.H. Menze et al.

    The multimodal brain tumor image segmentation benchmark (BRATS)

    IEEE Trans. Med. Imag.

    (2014)
  • B. Zhang et al.

    Multi-scale segmentation squeeze-and-excitation UNet with conditional random field for segmenting lung tumor from CT images

    Comput. Methods Progr. Biomed.

    (2022)
  • R. Zheng et al.

    Automatic liver tumor segmentation on dynamic contrast enhanced MRI using 4D information: deep learning model based on 3D convolution and convolutional LSTM

    IEEE Trans. Med. Imag.

    (2022)
  • D.T. Kushnure et al.

    HFRU-net: high-level feature fusion and recalibration UNet for automatic liver and tumor segmentation in CT images

    Comput. Methods Progr. Biomed.

    (2022)
  • X. Yu et al.

    Gross Tumor Volume Segmentation for Stage III NSCLC Radiotherapy Using 3D ResSE-Unet

    (2022)
  • J. Long et al.

    Fully convolutional networks for semantic segmentation

  • T. Zhou et al.

    A multi-modality fusion network based on attention mechanism for brain tumor segmentation

  • H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, M. Wang, Swin-unet: Unet-like pure transformer for medical image...
  • J. L. Silva, M. N. Menezes, T. Rodrigues, B. Silva, F. J. Pinto, A. L. Oliveira, Encoder-decoder architectures for...
  • F. Fang et al.

    Self-supervised multi-modal hybrid fusion network for brain tumor segmentation

    IEEE Journal of Biomedical and Health Informatics

    (2021)
  • H. Su et al.

    Multilevel threshold image segmentation for COVID-19 chest radiography: a framework using horizontal and vertical multiverse optimization

    Comput. Biol. Med.

    (2022)
  • A. Qi et al.

    Directional mutation and crossover boosted ant colony optimization with application to COVID-19 X-ray image segmentation

    Comput. Biol. Med.

    (2022)
  • K. Hu et al.

    Colorectal polyp region extraction using saliency detection network with neutrosophic enhancement

    Comput. Biol. Med.

    (2022)
  • Z. Wu et al.

    How to ensure the confidentiality of electronic medical records on the cloud: a technical perspective

    Comput. Biol. Med.

    (2022)
  • J. Zhou et al.

    Recognition of imbalanced epileptic EEG signals by a graph-based extreme learning machine

    Wireless Commun. Mobile Comput.

    (2021)
  • Cited by (7)

    View all citing articles on Scopus
    View full text