Style transfer in conditional GANs for cross-modality synthesis of brain magnetic resonance images

https://doi.org/10.1016/j.compbiomed.2022.105928Get rights and content

Highlights

  • Style transfer technique is introduced into the conditional GAN architecture to address the issue of cross-modality MR image synthesis.

  • A conditional GAN model with hierarchical feature mapping and fusion (ST-cGAN) is proposed to obtain synthetic image with enhanced style.

  • Per-pixel random noise is added to different scales of the proposed generator network to train a robust generator which is insensitive to noise.

  • The experimental results confirm the effectiveness of ST-cGAN from different perspectives of image quality assessment.

Abstract

Magnetic resonance imaging (MRI) has become one of the most standardized and widely used neuroimaging protocols in the detection and diagnosis of neurodegenerative diseases. In clinical scenarios, multi-modality MR images can provide more comprehensive information than single modality images. However, high-quality multi-modality MR images can be difficult to obtain in the actual diagnostic process due to various uncertainties. Efficient methods of modality complement and synthesis have aroused increasing attention in the research community. In this article, style transfer is introduced into conditional generative adversarial networks (cGAN) architecture. A cGAN model with hierarchical feature mapping and fusion (ST-cGAN) is proposed to address the cross-modality synthesis of MR images. In order to surmount the sole focus on the pixel-wise similarity as most cGAN-based methods do, the proposed ST-cGAN takes advantage of the style information and applies it to the synthetic image’s content structure. Taking images of two modalities as conditional input, ST-cGAN extracts different levels of style features and integrates them with the content features to form the style-enhanced synthetic image. Furthermore, the proposed model is made robust to random noise by adding noise input to the generator. A comprehensive analysis is performed by comparing the proposed ST-cGAN with other state-of-the-art baselines based on four representative evaluation metrics. The experimental results on the IXI (Information eXtraction from Images) dataset verify the validity of the ST-cGAN from different evaluation perspectives.

Introduction

Magnetic resonance imaging (MRI) provides an intuitive method for studying the structure and function of the human brain, and has become one of the most standardized and widely used neuroimaging methods in the detection and diagnosis of neurodegenerative disease. It is a non-invasive and radiation-free imaging technique used to generate high-resolution 3D or 4D images of different brain tissues. Different pulse sequences and parameters in the scanning process of imaging equipment can generate various tissue contrast images. These multiple-modality MR images can display valuable information on the tissue structure and function from different aspects. For example, T1-weighted (T1) images, characterized by short repetition time (TR) and short echo time (TE), are better suited for observing the anatomical structures and distinguishing between the gray matter (GM) and white matter (WM). T2-weighted (T2) images, with long TR and long TE, have better visualization of the tissue lesions. Fluid attenuated inversion recovery (FLAIR) is a T2-weighted contrast image with an inversion recovery sequence to improve the conspicuity of lesions in WM. The cerebrospinal fluid (CSF) looks black on FLAIR images and white on T2 images [1], [2].

In most clinical scenarios, multiple-modality MR images are the preferred choice as they provide more comprehensive information on disease diagnosis compared to single modal images [3]. For example, multimodal images are beneficial in unveiling subtle pathologic changes of the brain tissues, whereas single modal images make it hard to appreciate these details. However, different medical institutions are limited to their respective scanning equipment and imaging protocols [4], which may cause uncertainties in collecting multiple paired modalities MR images. Additionally, some modalities of MR images are unusable during data acquisition and storage due to artifacts, improper scanning parameters or the lost of some sequences [5], [6]. All of these conditions complicate the application of multimodal MR images in clinical diagnosis, creating uncertainties in fully exploiting the true efficacy of multimodal MR images. Moreover, rescanning the same subject in order to obtain the missing or unavailable modalities would be highly impractical. Apart from the high cost, the abnormalities detected on the subjects’ brains will change over time, making the new data no longer match the original data. Therefore, cross-modality synthesis of MR images has been conducted to address modality absence and inconsistency.

Image synthesis can be summarized as a process of generating new images similar to the original data by learning the image features of the original data domain. Since image synthesis can be used as an effective method for data augmentation and preprocessing steps of various downstream image processing tasks (i.e., segmentation, classification and so on), it has recently attracted a ton of attention, and there has been an increase in the amount of research on medical image synthesis. We present this research work in two categories of medical image synthesis: unconditional synthesis and cross-modality synthesis (a type of conditional synthesis).

(1) Unconditional image synthesis

Unconditional synthesis aims to learn the data distribution of the original images and generate new images satisfying the original distribution without any other conditional item [7]. Among the various image synthesis methods, algorithms based on generative deep learning have made breakthroughs in different applications. As one of the most representative approaches, generative adversarial networks (GANs) [8] broaden the boundaries of traditional patterns in medical imaging because of their ability to generate high-quality and realistic images [7], [9]. Calimeri et al. [10] used Laplacian generative adversarial networks (LAPGAN) [11] to progressively generate brain MR images from coarse features to fine features. The evaluation results produced by quantitative metrics and experts’ manual inspection showed its effectiveness in generating realistic brain MR images. The clinical demand for medical image resolution has encouraged researchers to try more GAN frameworks with the capacity to generate high-resolution images. Beers et al. [12] introduced progressively grown GANs (PGGAN) [13] into the synthesis of multi-modal MR images of gliomas as well as fundus photograph of vascular lesions, gradually generating images from a low resolution to the desired resolution.

In addition to GANs, a variety of deep generative models have emerged in the field of unconditional image synthesis, and some scholars have applied multiple generative models to the synthesis of MR images to obtain a comprehensive comparison. Zhuang et al. [14] compared gaussian mixture models (GMMs) [15], variational auto-encoders (VAEs) [16] and GANs on data augmentation of functional MRI (fMRI). They found that Improved Wasserstein GAN [17] framework and VAE with conditional variants could generate high quality, diverse and task-dependent brain images. Kwon et al. [18] leveraged VAEs and GANs to build a framework for normal and pathological brain MR image synthesis named auto-encoding GANs [19], possessing an additional code discriminator in the network structure. Their hybrid model succeeded in solving image blurriness and mode collapse problems when generating MR images. Unconditional image synthesis techniques are able to synthesize realistic and diverse images, which is increasingly used as a means of data augmentation. However, due to the fact that these models have no conditional terms in synthesizing images, they are often incompetent when targeted modality synthesis is required. In this case, cross-modality synthesis with conditional constraints is needed to achieve one-to-one correspondence of different modality images.

(2) Cross-modality image synthesis

Cross-modality image synthesis, also known as image modality translation, enables the conversion of one possible representation of the image content into another given enough training data, which is essentially a pixel-to-pixel mapping problem [7]. Machine learning approaches have been quickly introduced into the cross-modality image synthesis. Jog et al. [20] adopted random forest regression to predict the intensities of brain tissue contrasts under a given input. This approach was able to synthesize both T2-weighted and FLAIR images with fast computation. Chartsias et al. [6] proposed a fully convolutional neural network model via modality-invariant latent representation to synthesize multi-modality MR images given multi-modality input. It embedded the input modalities into a shared latent space, and transformed the fused representation into the target modality through a decoder.

Unsurprisingly, GAN-based methods are also widely studied in cross-modality image synthesis. Among this kind of research, the most prevalent methods are based on conditional GANs [21], which learn a representation from conditional input to target output. Dar et al. [22] utilized conditional GANs to conduct multi-contrast image synthesis; in other words, mutual translation of T1-weighted and T2-weighted images. They further improved the synthesis quality by adding neighboring cross-section images to the model. Yu et al. [23] focused on the influence of image texture details on the content structure of the synthesized image, and proposed an edge-aware GAN by introducing an edge detector for multi-modality brain MR image synthesis. Furthermore, they proposed a sample-adaptive GAN model to enhance the local space learning for individual samples [24]. They divided the model into two paths for learning: one for the global spatial mapping of every sample and another for the mapping of neighboring samples according to individual samples and fusing the target modality feature information, so as to flexibly adjust the model for cross-modality synthesis. Sharma and Hamarneh [25] implemented multi-modal GAN to supplement missing MRI pulse sequences. The multi-input multi-output (MIMO) model was able to synthesize missing pulse sequences from any combination of available pulse sequences.

Since cross-modality image synthesis is a pixel-to-pixel mapping problem [7], most conditional GAN-based methods focus on the one-to-one correspondence of pixels between the synthetic image and the reference modality image when designing the network. Among them, pix2pix [26] aims to maximize the pixel-wise intensity similarity between the synthetic image and the reference image, which requires two matching modality images as input during model training. Obtaining sufficient paired images of two modalities for model training in practice can be quite challenging, and overemphasizing the pixel-wise similarity may lead to ignorance of information like shape, texture, visual pattern and other style features. Based on the concept of image style, we introduced style transfer [27] into the conditional GAN-based cross-modality image synthesis framework. Although some studies have been carried out on style-based image translation [28], [29], [30], [31], the integration of style transfer and conditional GAN, taking advantage of their respective strengths, is a new and promising attempt in cross-modality synthesis of MR images. In the proposed hierarchical style transfer conditional GAN model, the similarity of synthetic image in terms of target modality style is enhanced through the fusion of image content and style features in different layers of the network. The integrated quality of the synthetic image produced by the proposed method is further improved with both pixel-level and style-level similarities, which is more in line with visual perception.

The main contributions of this work include the following:

(1) Style transfer technique is introduced into the conditional GAN architecture and a generative model with hierarchical feature mapping and fusion (ST-cGAN) is proposed. The proposed model receives two modalities as conditional input and extracts content and style features in different layers of the network. It applies style transfer and feature fusion to the hybrid features to obtain synthetic image with enhanced style, which makes the synthetic image closely resemble the target modality image from a stylistic point of view and effectively improves the image quality.

(2) Since noise or other artifacts may greatly affect the readability of MR images, it is essential to improve the robustness of the model to noise. The proposed model considers the effect of noise on image quality by adding random disturbance to different scales of the generator network during the image synthesis. The noise input turns out to be helpful for training a robust generator which is insensitive to noise.

(3) Image quality assessment is a complicated problem and different evaluation metrics may yield discrepant results. To provide a comprehensive and reasonable comparison between the proposed method and baseline methods, evaluation metrics of four representative dimensions (i.e., pixel-based, structure-based, feature-based and distribution-based) are utilized to assess the synthetic image. The comparison results verify the validity of the ST-cGAN method, while these metrics help reveal the strengths and weaknesses of each method from different perspectives and provide ideas for targeted improvements.

The rest of this article is organized as follows: An elaborate description of the methodologies as well as our proposed model are provided in Section 2. Section 3 introduces our experimental design and its implementations. The experimental results and detailed analysis are presented in Section 4. A brief conclusion along with a prospect for future work is given in Section 5.

Section snippets

Proposed method

In the original GANs proposed by Goodfellow et al. [8], the generator network G is trained to learn a transformation from random noise z in prior distribution to target data distribution. Meanwhile the discriminator network D is trained to discriminate the generated samples from the real ones. Conditional GANs (cGAN) [21] learn a mapping from conditioned input to target output, which is different from generating data from random noise. Image-to-image translation [32], which generates target

Experiments and implementation

This section introduces the dataset used in this study and experimental settings. We also provide a description of the comparison methods and evaluation metrics. The experiments were implemented on 3.50 GHz CPU with 192 GB RAM, and NVIDIA Quadro P4000 8 GB GPU.

Results and discussion

This section reports the results and delivers an elaborate analysis. Section 4.1 reveals the experimental results regarding the IXI dataset, and a thorough analysis of these results is conducted from a statistical perspective in Section 4.2. In Section 4.3, we perform experiments to discuss the effect of noise on our synthetic image.

Conclusion

In this study, style transfer technique is introduced into conditional GAN architecture and the ST-cGAN model is proposed to address cross-modality image synthesis. ST-cGAN receives images of two modalities as conditional input and conducts hierarchical feature mapping and fusion. The style features of target modality image are extracted and applied to the content features of synthetic image by adaptive instance normalization, making the synthetic image and the target image possess more

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. This research made use of the open-source IXI dataset (http://brain-development.org/ixi-dataset/), which hosts resources for the computational analysis of brain development and has made great contributions to the brain science research.

References (65)

  • ChartsiasA. et al.

    Multimodal MR synthesis via modality-invariant latent representation

    IEEE Trans. Med. Imaging

    (2018)
  • YiX. et al.

    Generative adversarial network in medical imaging: a review

    Med. Image Anal.

    (2018)
  • GoodfellowI. et al.

    Generative adversarial nets

  • KazeminiaS. et al.

    GANs for medical image analysis

    (2018)
  • CalimeriF. et al.

    Biomedical data augmentation using generative adversarial neural networks

  • E.L. Denton, S. Chintala, A. Szalm, R. Fergus, Deep generative image models using a laplacian pyramid of adversarial...
  • BeersA. et al.

    High-resolution medical image synthesis using progressively grown generative adversarial networks

    (2018)
  • T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive growing of GANs for improved quality, stability, and variation,...
  • ZhuangP. et al.

    fMRI data augmentation via synthesis

  • RichardsonE. et al.

    On GANs and GMMs

    (2018)
  • D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: Proceedings of the 2nd International Conference on...
  • GulrajaniI. et al.

    Improved training of wasserstein GANs

    (2017)
  • KwonG. et al.

    Generation of 3D brain MRI using auto-encoding generative adversarial networks

    (2019)
  • RoscaM.

    Variational approaches for auto-encoding generative adversarial networks

    (2017)
  • MirzaM. et al.

    Conditional generative adversarial nets

    (2014)
  • DarS.U. et al.

    Image synthesis in multi-contrast MRI with conditional generative adversarial networks

    IEEE Trans. Med. Imaging

    (2019)
  • YuB. et al.

    Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis

    IEEE Trans. Med. Imaging

    (2019)
  • YuB. et al.

    Sample-adaptive GANs: linking global and local mappings for cross-modality MR image synthesis

    IEEE Trans. Med. Imaging

    (2020)
  • SharmaA. et al.

    Missing MRI pulse sequence synthesis using multi-modal generative adversarial network

    IEEE Trans. Med. Imaging

    (2019)
  • P. Isola, J.Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, in:...
  • GatysL.A. et al.

    A neural algorithm of artistic style

    (2015)
  • GatysL.A. et al.

    Preserving color in neural artistic style transfer

    (2016)
  • Cited by (9)

    • Unsupervised synthesis of realistic coronary artery X-ray angiogram

      2023, International Journal of Computer Assisted Radiology and Surgery
    View all citing articles on Scopus
    View full text