Unsupervised X-ray image segmentation with task driven generative adversarial networks
Graphical abstract
Introduction
X-ray imaging is one of the most frequently used clinical exams. Semantic understanding of anatomical structures in X-ray images is critical to many clinical applications, such as pathological diagnosis, treatment evaluation and surgical planning. It serves as a fundamental step for computer-aided diagnosis as well as image-guided invention, and can enable intelligent workflows including organ-based autocollimation, infinite-capture range registration, motion compensation and automatic reporting (Aggarwal, Vig, Bhadoria, Dethe, 2011, Sharma, Aggarwal, 2010). In this paper, we study one of the most important problems in semantic understanding of X-ray image, i.e., multi-organ segmentation.
While X-ray understanding is of great clinical importance, it remains a very challenging task. This is mainly due to the projective nature of X-ray imaging. While image generation is based on radiation absorption along the X-ray trajectories, three dimensional spatial information between the anatomies is compressed into two dimensions. Such information loss causes many difficulties for semantic X-ray image parsing, including large overlapping of anatomies, fuzzy object boundaries and complex texture patterns.
Conventional segmentation methods rely on prior knowledge of the procedure (e.g., anatomical motion pattern from a sequence of images (Zhu et al., 2009) to delineate anatomical structures from X-ray images which can be time-consuming and the performance is limited. Modern approaches utilize deep convolutional networks and have shown superior performance (Ronneberger et al., 2015). However, they suffer from two major drawbacks. First, most of them are supervised approaches, which means that the model training process could not be proceeded if data annotations are unavailable. Indeed, they often require a large amount of annotated images. Most widely used segmentation networks such as fully convolutional network (FCN) (Long et al., 2015), UNet (Ronneberger et al., 2015) and recently developed dense UNet (Jégou et al., 2017) usually contain millions of parameters which can be easily over-fitted if the training dataset is small. Due to the heterogeneous nature of X-ray images, accurate annotating is extremely difficult and time-consuming even for skilled clinicians. Obtaining a large enough annotated X-ray dataset is thus impractical. Second, most of these deep models are domain specific and their generalities are limited. Well-trained models may achieve great performance on one image modality but are very likely to fail on others. For example, a network trained on computed tomography (CT) is barely able to work on magnetic resonance imaging (MRI). In oder to obtain a segmentation model on different types of images, the clinicians need to do a lot of annotations on each of the image modalities which is surely undesired.
In comparison with the difficulties on annotating X-ray images, organs in 3D CT scans preserve clearer structures as well as sharper boundaries and thus can be easily delineated. Large pixel-level labeled CT data are more accessible. For example, Yang et al. (2017) trained an image-to-image network for segmentation on hundreds of labeled 3D CT scans. Thousands of X-ray like images, the so-called Digitally Reconstructed Radiographs (DRRs), are rendered from labeled CTs and used in Albarqouni et al. (2017) to train an X-ray depth decomposition model. While using automatically generated DRRs for training has merits, the trained model cannot be directly applied on X-ray images due to their domain gaps (appearance differences), see Fig. 1. In this paper, we aim to answer the following question, given annotated synthetic DRRs, can we learn a model to segment the real X-ray images without any X-ray annotations?
Generalization of image segmentation models trained on DRRs to X-ray images is an unsupervised domain adaptation problem. While given labeled data from the source domain (DRRs) and unlabeled data from the target domain (X-rays), our goal is to learn a segmentation model on the data from the source and adapt it to the data from the target. Since the data in the target domain is unlabeled, such adaptation is unsupervised. Many effective models (Bousmalis, Trigeorgis, Silberman, Krishnan, Erhan, 2016, Tzeng, Hoffman, Saenko, Darrell, 2017) have been studied. Most of them focus on feature adaptation which naturally suits for recognition and detection. However, image segmentation desires pixel-wise classification which requires delicate model design and is substantially different. Recently, pixel-level adaptation models (Bousmalis, Silberman, Dohan, Erhan, Krishnan, 2017, Zhu, Park, Isola, Efros, 2017) have been proposed which utilize generative adversarial networks and achieve promising results on image synthesis and recognition. Still, continuing study on image segmentation especially for medical applications remains blank.
In this paper, we present a two-step model framework to address this challenge. In the first step, we generate synthetic DRRs as well as their pixel-level labeling from the segmented pre-operative 3D CT scans. A Deep Image-to-Image network (DI2I) (Huang, Liu, Van Der Maaten, Weinberger, 2017, Jégou, Drozdzal, Vazquez, Romero, Bengio, 2017) is trained for multi-organ (lung, heart, liver, bone) segmentation over these synthetic data. In the second step, inspired by the recent success of image style transfer with cycle generative adversarial network (cycle-GAN) (Zhu et al., 2017), we introduce a task driven generative adversarial network (TD-GAN) to achieve simultaneous image synthesis and automatic segmentation on X-ray images, see Fig. 2 for an overview. We remark that the X-ray images used for training are unpaired with previously generated DRRs and are totally unlabeled. The proposed TD-GAN consists of a modified cycle-GAN substructure for pixel-to-pixel translation between DRRs and X-ray images. Meanwhile, TD-GAN incorporates the pre-trained DI2I to obtain deep supervision and enforce consistent performance on segmentation. The intuition behind TD-GAN is indeed very simple: we transfer X-ray images in the same appearance as DRRs and hence leverage the pre-trained DI2I model to segment them. Furthermore, the entire transfer is guided by the segmentation supervision network.
The contributions of our work are: 1) We propose a novel model pipeline for X-ray image segmentation from unpaired synthetic DRRs. 2) We introduce an effective deep architecture TD-GAN for simultaneously image synthesis and segmentation without any labeling effort necessary from X-ray images. To our best knowledge, this is the first end-to-end framework for unsupervised medical image segmentation. 3) The entire model framework can be easily adjusted for unsupervised domain adaptation problem where labels from one domain is completely missing. 4) We conduct numerical experiments and demonstrated the effectiveness of the proposed model on over 300 unlabeled topograms and 500 unlabeled Chest X-ray images using synthetic DRRs generated from over 800 CT scans.
This paper is organized as follows. In Section 2 we review existing methods for image segmentation and domain adaptation. In Section 3 we overview the problem and discuss our methodology in details. Numerical experiments and their the quantitative and qualitative results are shown in Section 4. Finally, we conclude this paper and propose possible future directions in Section 5.
The key concepts were first introduced in Zhang et al. (2018) and have been extended in this paper. We present comparisons between several image-to-image network models on DRR segmentation and the X-ray segmentation with their induced TD-GAN variations. Compared with the dataset presented in Zhang et al. (2018) which consists of 153 topograms, we evaluate the proposed TD-GAN model on a larger dataset consisting of 328 topograms with more detailed analysis. Furthermore, we demonstrate the effectiveness of the proposed framework with qualitative study on over 500 chest X-ray images randomly selected from NIH public dataset (Wang et al., 2017) using the same DRRs dataset. Since this dataset does not contain ground truth annotations, we present qualitative results on these images. Quantitative study has also been added on the public JSRT dataset which contains over 200 annotated chest X-rays.
Section snippets
Related works
Semantic Segmentation. Image segmentation is one of the fundamental problems in computer vision and medical image processing. There have been many approaches proposed in the literature, including level set models (Vese and Chan, 2002), graph-cut models (Boykov and Funka-Lea, 2006) and learning based models (Ronneberger et al., 2015), etc..
In recent years, deep network models have been extensively studied and shown better performance with usually a faster speed than conventional methods. One of
Problem and methodology
In this section, we present in details of our two-step model framework. In Section 3.1, we present the first step where an image-to-image network is trained for segmentation on synthetic DRRs. In Section 3.2, we present the second step which incorporates the pretrained model into a task driven generative adversarial network to achieve simultaneous image synthesis and segmentation on X-ray images.
Our goal is to learn an unsupervised multi-organ segmentation model on X-ray images using pixel-wise
Experiments and results
We validate our methodology on different datasets both quantitatively and qualitatively. We use 815 labeled DRRs as the data in the source domain which are generated from the 3D CT scans through the aforementioned simulation process. Most of the DRR images cover a broad field of view from regions near neck to kidney. We apply TD-GAN on two different target domains, topograms and chest X-ray images. To demonstrate the effectiveness the TD-GAN architecture, we vary the image-to-image networks and
Conclusions
In this paper, we studied the unsupervised multi-organ segmentation problem on X-ray images with a novel task driven generative adversarial network model. The proposed model framework takes synthetic labeled DRR images as input, and is able to produce meaningful segmentation results on real X-ray images without any ground truth annotations. It leverages a cycle-GAN substructure to achieve image style transfer and carefully designed add-on modules to simultaneously segment organs of interest. We
Declaration of Competing Interest
None.
Acknowledgment
The authors would like to thank Vivek Kumar Singh for aiding in the preparation of the topogram dataset. We also thank the reviewers for their helpful feedback and suggestions which greatly improved this paper.
References (43)
- et al.
A multiphase level set framework for image segmentation using the mumford and shah model
Int. J. Comput. Vis.
(2002) - et al.
Role of segmentation in medical imaging: a comparative study
Int. J. Comput. Appl.
(2011) - et al.
X-ray in-depth decomposition: revealing the latent structures
International Conference on Medical Image Computing and Computer-Assisted Intervention
(2017) - et al.
SegNet: a deep convolutional encoder-decoder architecture for image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
Unsupervised pixel-level domain adaptation with generative adversarial networks
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2017) - et al.
Domain separation networks
Advances in Neural Information Processing Systems
(2016) - et al.
Graph cuts and efficient nd image segmentation
Int. J. Comput. Vis.
(2006) - et al.
DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
Semi-supervised domain adaptation with instance constraints
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2013)
Domain-adversarial training of neural networks
TheJournal of Machine Learning Research
Fast R-CNN
2015 IEEE International Conference on Computer Vision (ICCV)
Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation
International Conference on Machine Learning
Mask R-CNN
Proceedings of the IEEE International Conference on Computer Vision
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Densely connected convolutional networks
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
The one hundred layers Tiramisu: fully convolutional densenets for semantic segmentation
Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on
Ways of conditioning generative adversarial networks
Workshop on Adversarial Training, Neural Information Processing Systems (NeurIPS)
Path aggregation network for instance segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Fully convolutional networks for semantic segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Transfer feature learning with joint distribution adaptation
Proceedings of the IEEE International Conference on Computer Vision
Cited by (55)
Recent advances and clinical applications of deep learning in medical image analysis
2022, Medical Image AnalysisCitation Excerpt :Dense convolutional blocks (Huang et al., 2017) also demonstrated its superiority in enhancing segmentation performance on liver and tumor CT volumes (Li et al., 2018). Besides the redesigned skip connections and modified architectures, U-Net based segmentation approaches also benefit from adversarial training (Xue et al., 2018; Zhang et al., 2020b), attention mechanisms (Jetley et al., 2018; Anderson et al., 2018; Oktay et al., 2018; Nie et al., 2018; Sinha and Dolz, 2021), and uncertainty estimation (Wang et al., 2019a; Yu et al., 2019; Baumgartner et al., 2019; Mehrtash et al., 2020). For example, Xue et al. (2018) developed an adversarial network for brain tumor segmentation, and the network has two parts: a segmentor and a critic.
Robust color medical image segmentation on unseen domain by randomized illumination enhancement
2022, Computers in Biology and MedicineCitation Excerpt :DA aims to borrow knowledge from a related but different source domain with sufficient labeled data to improve the performance on the target domain. DA methods typically learn domain-invariant representations by performing distribution alignment [13–15] or style transfer [16–19]. However, it still requires pre-collected unlabeled target-domain images for training.
Review of Segmentation Methods for Coastline Detection in SAR Images
2024, Archives of Computational Methods in EngineeringA comprehensive review of generative adversarial networks: Fundamentals, applications, and challenges
2024, Wiley Interdisciplinary Reviews: Computational StatisticsA Survey of Generative Adversarial Networks for Synthesizing Structured Electronic Health Records
2024, ACM Computing SurveysFederated learning with deep convolutional neural networks for the detection of multiple chest diseases using chest x-rays
2024, Multimedia Tools and Applications