Improved generative adversarial network for retinal image super-resolution

https://doi.org/10.1016/j.cmpb.2022.106995Get rights and content

Highlights

  • Developed an improved generative adversarial network.

  • Designed a novel residual attention block.

  • Use the Charbonnier loss function instead of the MSE loss function.

  • Remove the BN layer and add multiple updated residual blocks.

Abstract

Background and objective

The retina is the only organ in the body that can use visible light for non-invasive observation. By analyzing retinal images, we can achieve early screening, diagnosis and prevention of many ophthalmological and systemic diseases, helping patients avoid the risk of blindness. Due to the powerful feature extraction capabilities, many deep learning super-resolution reconstruction networks have been applied to retinal image analysis and achieved excellent results.

Methods

Given the lack of high-frequency information and poor visual perception in the current reconstruction results of super-resolution reconstruction networks under large-scale factors, we present an improved generative adversarial network (IGAN) algorithm for retinal image super-resolution reconstruction. Firstly, we construct a novel residual attention block, improving the reconstruction results lacking high-frequency information and texture details under large-scale factors. Secondly, we remove the Batch Normalization layer that affects the quality of image generation in the residual network. Finally, we use the more robust Charbonnier loss function instead of the mean square error loss function and the TV regular term to smooth the training results.

Results

Experimental results show that our proposed method significantly improves objective evaluation indicators such as peak signal-to-noise ratio and structural similarity. The obtained image has rich texture details and a better visual experience than the state-of-the-art image super-resolution methods.

Conclusion

Our proposed method can better learn the mapping relationship between low-resolution and high-resolution retinal images. This method can be effectively and stably applied to the analysis of retinal images, providing an effective basis for early clinical treatment.

Introduction

Retinal image analysis is an important part of medical image analysis, which enables the diagnosis of many ophthalmologically relevant blinding diseases, such as retinoblastoma and age-related macular degeneration [1], [2]. In addition, due to the non-invasive method of taking retinal images, the super-resolution (SR) reconstruction technology of the retina helps experts achieve a more early and comprehensive diagnosis of blinding retinal diseases; at the same time, it can control the deterioration of the condition and help patients avoid the risk of blindness [3].

With the development of artificial intelligence, more and more researchers apply deep learning SR reconstruction technology to medical image processing, and deep learning-based retinal image analysis makes the diagnosis results more objective and quantitative [4]. At the same time, it does not require long-term professional training for experts to reach a level similar to that of experts in the industry, which solves the problem of the small number of experts in the field of ophthalmology in my country and the difficulties in screening in communities and offset areas, such as screening for retinopathy of prematurity, diabetic retinopathy, and another eye disease [5].

The single-image super-resolution (SISR) reconstruction is one of the important research directions in computer vision and image processing [6]. On the premise of not improving the hardware conditions of the imaging equipment, the image resolution is improved through signal processing and software methods, which is highly SR reconstruction. According to different reconstruction methods, standard SISR reconstruction algorithms can be divided into three categories: SR reconstruction based on interpolation [7], SR reconstruction based on reconstruction [8], and SR reconstruction based on learning [9].

Artificial intelligence technology has emerged in recent years, and deep learning neural network technology has been applied in various fields. The convolutional neural networks (CNNs) are good at extracting high-level abstract features of data and learning the potential distribution characteristics of data. The SR research based on deep learning has received extensive attention with the application of CNN in the image field [10]. Dong et al. [11] first proposed a SISR reconstruction method using super-resolution CNN. Still, the number of network layers in the network is small, the convergence is slower, and the convolution kernel is smaller. Furthermore, the extracted features are all local features that are difficult to recover in texture details, resulting in poor repetitive effects. Subsequently, Dong et al. [12] proposed a fast super-resolution convolutional neural network (FSRCNN), which replaced the 5 × 5 convolution kernel in SRCNN with two concatenated 3 × 3 convolution kernels to reduce parameters and increase the number of network layers.

Based on the SRCNN method, Shi and Caballero et al. [13] can convert low-resolution (LR) images into high-resolution (HR) images efficiently and in real-time by adding sub-pixel convolutional layers. He et al. [14] proposed deep residual learning for image recognition (ResNet), which solved the problem of gradient disappearance as the number of network layers increases. Kim et al. [15] proposed a very deep convolutional network (VDSR) based on the ResNet, using a deeper VGG network structure model, reaching a network depth of twenty layers, and significantly improving the network convergence speed through residual learning. Lim et al. [16] proposed an enhanced deep super-resolution network (EDSR), which removes the Batch Normalization (BN) layer in the residual block, saving 40% of the memory usage, and can build a larger model with better performance. Lai et al. [17] proposed Deep Laplacian Pyramid Networks for fast and accurate super-resolution (LapSRN), which replaced the L2 loss function with the Charbonnier loss function, conducted deep supervision training on the network, and achieved high-quality reconstruction.

In addition, inspired by the Generative Adversarial Network (GAN), Leding et al. [18] proposed a super-resolution generative adversarial network (SRGAN), which is the first to use GAN in the field of image SR, and improve the perception of images by enhancing the realism of some details. Wang et al. [19] proposed enhanced super-resolution generative adversarial networks (ESRGAN) based on the SRGAN, replacing residual blocks with dense blocks, and removing the BN layer, which significantly improved the reconstruction effect. The SR method based on the neural network has made significant progress, and researchers have applied it to the field of medical imaging. Hatvani et al. [20] used the U-Net network and sub-pixel network to perform super-resolution processing on 2D tooth computed tomography images, the model used the Mean Square Error (MSE) loss [21], and total variation regularization loss. This method achieved good results and helped experts better observe medical significance, such as the root canal's size, shape, and curvature.

Section snippets

SRGAN

The SRGAN is an algorithm model for SR reconstruction of images based on the GAN model, it consists of two parts: a generator and a discriminator. ISR represents the SR image reconstructed by the SRGAN network, IHR is an HR image, and ILR is an LR image corresponding to IHR, obtained by IHR through down-sampling. G represents the generator network, D represents the discriminator network, and the schematic diagram of the SRGAN model is shown in Fig. 1.

The generator generates ISR by inputting ILR

Methodology

Owing to the excellent image simulation generation capability of the GAN network, it can help overcome the obstacles of existing deep learning in the field of retinal imaging, solve the limitation of data on retinal image analysis, and enable deep learning to be more widely used in the field of retinal imaging. At the same time, the unique discriminant network of the GAN network can help generate finer local details of the network output, which is critical for the processing of retinal image

Dataset and training details

The dataset used in the experiment is a high-quality 2K resolution image DIV2K dataset [29] newly proposed in recent years, containing 800 training images, 100 verification images, and 100 test images. However, since the test images have not been released yet, the Set5 [30], Set14 [31], and Urban100 [32] datasets are used here for testing. All experiments are performed between LR images and HR images for 4 × enlargement. Owing to the limitations of the experimental equipment, the image is

Conclusion

In this paper, we propose an improved generative adversarial network (IGAN) for retinal image SR reconstruction algorithm based on the SRGAN, which increases the high-frequency information of the image by constructing the residual block of the attention convolutional neural network. Furthermore, we remove the BN layer that affects the quality of retinal image generation in the residual block, improving the network's training speed. Then, we introduce the more robust Charbonnier to replace the

Ethical approval

No ethics approval is required.

Declaration of Competing Interest

The authors declare that they have no conflicts of interest.

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant Nos. 61976215 and 62176259.

References (42)

  • R KEYS

    Cubic convolution interpolation for digital image processing

    IEEE Trans Acoust Speech Signal Process.

    (1981)
  • T Dai et al.

    Second-order attention network for single image super-resolution

  • C Dong et al.

    Image super-resolution using deep convolutional networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • C Dong et al.

    Accelerating the super-resolution convolutional neural network

  • W Shi et al.

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

  • K He et al.

    Deep residual learning for image recognition

  • J Kim et al.

    Accurate image super resolution using very deep convolutional networks

  • B Lim et al.

    Enhanced deep residual networks for single image super-resolution

  • W S Lai et al.

    Deep laplacian pyramid networks for fast and accurate super resolution

  • C Ledig et al.

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

    (2017)
  • X Wang et al.

    ESRGAN: enhanced super-resolution generative adversarial networks

  • Cited by (12)

    View all citing articles on Scopus
    View full text