Unsupervised X-ray image segmentation with task driven generative adversarial networks

https://doi.org/10.1016/j.media.2020.101664Get rights and content

Highlights

  • Generative adversarial network with added supervision from task-related networks.

  • Simultaneous image synthesis and parsing between annotated DRRs and unannotated X-rays.

  • Unsupervised X-ray image segmentation with promising results close to supervised protocol.

Abstract

Semantic parsing of anatomical structures in X-ray images is a critical task in many clinical applications. Modern methods leverage deep convolutional networks, and generally require a large amount of labeled data for model training. However, obtaining accurate pixel-wise labels on X-ray images is very challenging due to the appearance of anatomy overlaps and complex texture patterns. In comparison, labeled CT data are more accessible since organs in 3D CT scans preserve clearer structures and thus can be easily delineated. In this paper, we propose a model framework for learning automatic X-ray image parsing from labeled 3D CT scans. Specifically, a Deep Image-to-Image network (DI2I) for multi-organ segmentation is first trained on X-ray like Digitally Reconstructed Radiographs (DRRs) rendered from 3D CT volumes. Then we build a Task Driven Generative Adversarial Network (TD-GAN) to achieve simultaneous synthesis and parsing for unseen real X-ray images. The entire model pipeline does not require any annotations from the X-ray image domain. In the numerical experiments, we validate the proposed model on over 800 DRRs and 300 topograms. While the vanilla DI2I trained on DRRs without any adaptation fails completely on segmenting the topograms, the proposed model does not require any topogram labels and is able to provide a promising average dice of 86% which achieves the same level of accuracy as results from supervised training (89%). Furthermore, we also demonstrate the generality of TD-GAN through quantatitive and qualitative study on widely used public dataset.

Introduction

X-ray imaging is one of the most frequently used clinical exams. Semantic understanding of anatomical structures in X-ray images is critical to many clinical applications, such as pathological diagnosis, treatment evaluation and surgical planning. It serves as a fundamental step for computer-aided diagnosis as well as image-guided invention, and can enable intelligent workflows including organ-based autocollimation, infinite-capture range registration, motion compensation and automatic reporting (Aggarwal, Vig, Bhadoria, Dethe, 2011, Sharma, Aggarwal, 2010). In this paper, we study one of the most important problems in semantic understanding of X-ray image, i.e., multi-organ segmentation.

While X-ray understanding is of great clinical importance, it remains a very challenging task. This is mainly due to the projective nature of X-ray imaging. While image generation is based on radiation absorption along the X-ray trajectories, three dimensional spatial information between the anatomies is compressed into two dimensions. Such information loss causes many difficulties for semantic X-ray image parsing, including large overlapping of anatomies, fuzzy object boundaries and complex texture patterns.

Conventional segmentation methods rely on prior knowledge of the procedure (e.g., anatomical motion pattern from a sequence of images (Zhu et al., 2009) to delineate anatomical structures from X-ray images which can be time-consuming and the performance is limited. Modern approaches utilize deep convolutional networks and have shown superior performance (Ronneberger et al., 2015). However, they suffer from two major drawbacks. First, most of them are supervised approaches, which means that the model training process could not be proceeded if data annotations are unavailable. Indeed, they often require a large amount of annotated images. Most widely used segmentation networks such as fully convolutional network (FCN) (Long et al., 2015), UNet (Ronneberger et al., 2015) and recently developed dense UNet (Jégou et al., 2017) usually contain millions of parameters which can be easily over-fitted if the training dataset is small. Due to the heterogeneous nature of X-ray images, accurate annotating is extremely difficult and time-consuming even for skilled clinicians. Obtaining a large enough annotated X-ray dataset is thus impractical. Second, most of these deep models are domain specific and their generalities are limited. Well-trained models may achieve great performance on one image modality but are very likely to fail on others. For example, a network trained on computed tomography (CT) is barely able to work on magnetic resonance imaging (MRI). In oder to obtain a segmentation model on different types of images, the clinicians need to do a lot of annotations on each of the image modalities which is surely undesired.

In comparison with the difficulties on annotating X-ray images, organs in 3D CT scans preserve clearer structures as well as sharper boundaries and thus can be easily delineated. Large pixel-level labeled CT data are more accessible. For example, Yang et al. (2017) trained an image-to-image network for segmentation on hundreds of labeled 3D CT scans. Thousands of X-ray like images, the so-called Digitally Reconstructed Radiographs (DRRs), are rendered from labeled CTs and used in Albarqouni et al. (2017) to train an X-ray depth decomposition model. While using automatically generated DRRs for training has merits, the trained model cannot be directly applied on X-ray images due to their domain gaps (appearance differences), see Fig. 1. In this paper, we aim to answer the following question, given annotated synthetic DRRs, can we learn a model to segment the real X-ray images without any X-ray annotations?

Generalization of image segmentation models trained on DRRs to X-ray images is an unsupervised domain adaptation problem. While given labeled data from the source domain (DRRs) and unlabeled data from the target domain (X-rays), our goal is to learn a segmentation model on the data from the source and adapt it to the data from the target. Since the data in the target domain is unlabeled, such adaptation is unsupervised. Many effective models (Bousmalis, Trigeorgis, Silberman, Krishnan, Erhan, 2016, Tzeng, Hoffman, Saenko, Darrell, 2017) have been studied. Most of them focus on feature adaptation which naturally suits for recognition and detection. However, image segmentation desires pixel-wise classification which requires delicate model design and is substantially different. Recently, pixel-level adaptation models (Bousmalis, Silberman, Dohan, Erhan, Krishnan, 2017, Zhu, Park, Isola, Efros, 2017) have been proposed which utilize generative adversarial networks and achieve promising results on image synthesis and recognition. Still, continuing study on image segmentation especially for medical applications remains blank.

In this paper, we present a two-step model framework to address this challenge. In the first step, we generate synthetic DRRs as well as their pixel-level labeling from the segmented pre-operative 3D CT scans. A Deep Image-to-Image network (DI2I) (Huang, Liu, Van Der Maaten, Weinberger, 2017, Jégou, Drozdzal, Vazquez, Romero, Bengio, 2017) is trained for multi-organ (lung, heart, liver, bone) segmentation over these synthetic data. In the second step, inspired by the recent success of image style transfer with cycle generative adversarial network (cycle-GAN) (Zhu et al., 2017), we introduce a task driven generative adversarial network (TD-GAN) to achieve simultaneous image synthesis and automatic segmentation on X-ray images, see Fig. 2 for an overview. We remark that the X-ray images used for training are unpaired with previously generated DRRs and are totally unlabeled. The proposed TD-GAN consists of a modified cycle-GAN substructure for pixel-to-pixel translation between DRRs and X-ray images. Meanwhile, TD-GAN incorporates the pre-trained DI2I to obtain deep supervision and enforce consistent performance on segmentation. The intuition behind TD-GAN is indeed very simple: we transfer X-ray images in the same appearance as DRRs and hence leverage the pre-trained DI2I model to segment them. Furthermore, the entire transfer is guided by the segmentation supervision network.

The contributions of our work are: 1) We propose a novel model pipeline for X-ray image segmentation from unpaired synthetic DRRs. 2) We introduce an effective deep architecture TD-GAN for simultaneously image synthesis and segmentation without any labeling effort necessary from X-ray images. To our best knowledge, this is the first end-to-end framework for unsupervised medical image segmentation. 3) The entire model framework can be easily adjusted for unsupervised domain adaptation problem where labels from one domain is completely missing. 4) We conduct numerical experiments and demonstrated the effectiveness of the proposed model on over 300 unlabeled topograms and 500 unlabeled Chest X-ray images using synthetic DRRs generated from over 800 CT scans.

This paper is organized as follows. In Section 2 we review existing methods for image segmentation and domain adaptation. In Section 3 we overview the problem and discuss our methodology in details. Numerical experiments and their the quantitative and qualitative results are shown in Section 4. Finally, we conclude this paper and propose possible future directions in Section 5.

The key concepts were first introduced in Zhang et al. (2018) and have been extended in this paper. We present comparisons between several image-to-image network models on DRR segmentation and the X-ray segmentation with their induced TD-GAN variations. Compared with the dataset presented in Zhang et al. (2018) which consists of 153 topograms, we evaluate the proposed TD-GAN model on a larger dataset consisting of 328 topograms with more detailed analysis. Furthermore, we demonstrate the effectiveness of the proposed framework with qualitative study on over 500 chest X-ray images randomly selected from NIH public dataset (Wang et al., 2017) using the same DRRs dataset. Since this dataset does not contain ground truth annotations, we present qualitative results on these images. Quantitative study has also been added on the public JSRT dataset which contains over 200 annotated chest X-rays.

Section snippets

Related works

Semantic Segmentation. Image segmentation is one of the fundamental problems in computer vision and medical image processing. There have been many approaches proposed in the literature, including level set models (Vese and Chan, 2002), graph-cut models (Boykov and Funka-Lea, 2006) and learning based models (Ronneberger et al., 2015), etc..

In recent years, deep network models have been extensively studied and shown better performance with usually a faster speed than conventional methods. One of

Problem and methodology

In this section, we present in details of our two-step model framework. In Section 3.1, we present the first step where an image-to-image network is trained for segmentation on synthetic DRRs. In Section 3.2, we present the second step which incorporates the pretrained model into a task driven generative adversarial network to achieve simultaneous image synthesis and segmentation on X-ray images.

Our goal is to learn an unsupervised multi-organ segmentation model on X-ray images using pixel-wise

Experiments and results

We validate our methodology on different datasets both quantitatively and qualitatively. We use 815 labeled DRRs as the data in the source domain which are generated from the 3D CT scans through the aforementioned simulation process. Most of the DRR images cover a broad field of view from regions near neck to kidney. We apply TD-GAN on two different target domains, topograms and chest X-ray images. To demonstrate the effectiveness the TD-GAN architecture, we vary the image-to-image networks and

Conclusions

In this paper, we studied the unsupervised multi-organ segmentation problem on X-ray images with a novel task driven generative adversarial network model. The proposed model framework takes synthetic labeled DRR images as input, and is able to produce meaningful segmentation results on real X-ray images without any ground truth annotations. It leverages a cycle-GAN substructure to achieve image style transfer and carefully designed add-on modules to simultaneously segment organs of interest. We

Declaration of Competing Interest

None.

Acknowledgment

The authors would like to thank Vivek Kumar Singh for aiding in the preparation of the topogram dataset. We also thank the reviewers for their helpful feedback and suggestions which greatly improved this paper.

References (43)

  • L.A. Vese et al.

    A multiphase level set framework for image segmentation using the mumford and shah model

    Int. J. Comput. Vis.

    (2002)
  • P. Aggarwal et al.

    Role of segmentation in medical imaging: a comparative study

    Int. J. Comput. Appl.

    (2011)
  • S. Albarqouni et al.

    X-ray in-depth decomposition: revealing the latent structures

    International Conference on Medical Image Computing and Computer-Assisted Intervention

    (2017)
  • V. Badrinarayanan et al.

    SegNet: a deep convolutional encoder-decoder architecture for image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • K. Bousmalis et al.

    Unsupervised pixel-level domain adaptation with generative adversarial networks

    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • K. Bousmalis et al.

    Domain separation networks

    Advances in Neural Information Processing Systems

    (2016)
  • Y. Boykov et al.

    Graph cuts and efficient nd image segmentation

    Int. J. Comput. Vis.

    (2006)
  • L.-C. Chen et al.

    DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • L.-C. Chen et al.

    DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • J. Donahue et al.

    Semi-supervised domain adaptation with instance constraints

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • Y. Ganin et al.

    Domain-adversarial training of neural networks

    TheJournal of Machine Learning Research

    (2016)
  • R. Girshick

    Fast R-CNN

    2015 IEEE International Conference on Computer Vision (ICCV)

    (2015)
  • B. Gong et al.

    Connecting the dots with landmarks: discriminatively learning domain-invariant features for unsupervised domain adaptation

    International Conference on Machine Learning

    (2013)
  • K. He et al.

    Mask R-CNN

    Proceedings of the IEEE International Conference on Computer Vision

    (2017)
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • G. Huang et al.

    Densely connected convolutional networks

    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2017)
  • S. Jégou et al.

    The one hundred layers Tiramisu: fully convolutional densenets for semantic segmentation

    Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on

    (2017)
  • H. Kwak et al.

    Ways of conditioning generative adversarial networks

    Workshop on Adversarial Training, Neural Information Processing Systems (NeurIPS)

    (2016)
  • S. Liu et al.

    Path aggregation network for instance segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2018)
  • J. Long et al.

    Fully convolutional networks for semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • M. Long et al.

    Transfer feature learning with joint distribution adaptation

    Proceedings of the IEEE International Conference on Computer Vision

    (2013)
  • Cited by (55)

    • Recent advances and clinical applications of deep learning in medical image analysis

      2022, Medical Image Analysis
      Citation Excerpt :

      Dense convolutional blocks (Huang et al., 2017) also demonstrated its superiority in enhancing segmentation performance on liver and tumor CT volumes (Li et al., 2018). Besides the redesigned skip connections and modified architectures, U-Net based segmentation approaches also benefit from adversarial training (Xue et al., 2018; Zhang et al., 2020b), attention mechanisms (Jetley et al., 2018; Anderson et al., 2018; Oktay et al., 2018; Nie et al., 2018; Sinha and Dolz, 2021), and uncertainty estimation (Wang et al., 2019a; Yu et al., 2019; Baumgartner et al., 2019; Mehrtash et al., 2020). For example, Xue et al. (2018) developed an adversarial network for brain tumor segmentation, and the network has two parts: a segmentor and a critic.

    • Robust color medical image segmentation on unseen domain by randomized illumination enhancement

      2022, Computers in Biology and Medicine
      Citation Excerpt :

      DA aims to borrow knowledge from a related but different source domain with sufficient labeled data to improve the performance on the target domain. DA methods typically learn domain-invariant representations by performing distribution alignment [13–15] or style transfer [16–19]. However, it still requires pre-collected unlabeled target-domain images for training.

    • Review of Segmentation Methods for Coastline Detection in SAR Images

      2024, Archives of Computational Methods in Engineering
    View all citing articles on Scopus
    View full text