Elsevier

Information Sciences

Volumes 385–386, April 2017, Pages 457-474
Information Sciences

From coarse- to fine-grained implementation of edge-directed interpolation using a GPU

https://doi.org/10.1016/j.ins.2017.01.002Get rights and content

Abstract

The new edge-directed interpolation (NEDI) algorithm is non-iterative and orientation-adaptive. It achieves better edge performance in enhancing remote sensing images and natural images than conventional bi-linear and bi-cubic methods. It is also the theoretical foundation of many other complex regression and auto-regression interpolation methods. Although NEDI has impressive performance, its computation complexity is an obstacle to large-scale expansion. Parallel acceleration of NEDI provides strong versatility and extensibility. In this paper, we propose a fine-grained implementation for NEDI using GPU . In the fine-grained approach, we assign calculations of one unknown pixel to 2 × 2, 2 × 4, and 4 × 4 threads. Using NVIDIA TESLA K40C GPU for a case of asynchronous I/O transfer, our GPU optimization efforts using fine-grained NEDI can achieve a speedup of 99.09-fold when considering the I/O transfer time, compared with the original single-threaded C CPU code with the -O2 compiling optimization on Intel core™ i7-920. To demonstrate the effectiveness of our fine-grained scheme, we also compare the fine- and coarse-grained schemes by interpolating a 720p video to 1440p. Adopting the fine-grained scheme, we can achieve real-time display. The fine-grained parallel mode can be expanded to other algorithms based on regression and auto-regression schemes.

Introduction

Image interpolation is one of the most important areas of research in image processing. Images with high resolution and fine edge details are necessary for many visual systems and image processing tools [35]. To compensate for the limitations of hardware technology, image interpolation techniques are important for improving the visual quality of remote sensing images and natural images [28]. Such techniques involve estimating the values of unknown pixels through accurate computations and generating high-resolution images from low-resolution images [13]. In recent years, many image interpolation algorithms have been proposed. Most are based on the assumption that the values of unknown pixels can be obtained through linear combinations of the values of known pixels [2]. Nearest neighbor, bi-linear and bi-cubic interpolation all use that method. In particular, bi-cubic methods are widely adopted in a variety of image viewer and image processing tools. For smooth image area, even the nearest neighbor interpolation is very effective as it is simple and efficient. But for the edge area of the image, the pixel value changes greatly, it is difficult to estimate the unknown pixel. As the edges of the image are numerous, those methods [30], [34] are unable to achieve satisfactory edge performance or generate fine details. To improve edge continuity, many edge-directed interpolation algorithms have been proposed [1], [21], [24], [25], [30]. Their basic idea is to detect image edges and interpolate to enhance visual performance. So far, the most promising of these is the well-known new edge-directed interpolation (NEDI) algorithm proposed by Li and Orchard [21]. NEDI is based on the following theory: 1. Covariance-based adaptation can tune the prediction kernel support to match an arbitrarily oriented edge; 2. The high-resolution (HR) image has local covariance similar to that of the corresponding low-resolution (LR) image [21]. Li's method first calculates the local covariance of the LR image, and then obtains the interpolation weights through a mass of matrix operations according to the type of edge. The effectiveness of covariance-based interpolation theory enables NEDI to dramatically improve the subjective visual quality over these linear interpolations. The NEDI algorithm is a classic interpolation algorithm from which many more advanced and complex algorithms [1], [3], [8], [9], [10], [15], [16], [24], [28] have been proposed. Fattal proposed a upsampling algorithm via imposed edge statistics [8]. Freedman et al. proposed a video upscaling method based on local self-examples [9]. These improved algorithms provide better results at the cost of much greater computational complexity. In practice, an interpolation algorithm such as NEDI is restricted because its operation cycle is too long. Parallel acceleration for NEDI can be generalized to other extended algorithms with the same theoretical basis to improve their computational efficiency and expand their scope. Because computations for different pixels are performed independently, the high-performance computing (HPC) ability of GPUs can be used to perform this algorithm in parallel for different pixels.

Recently, general-purpose computing based on GPUs has become highly developed; current GPUs have thousands of cores and abundant bandwidth. With the significant increase in the number of cores and memory, GPU computing power has greatly improved. In recent years, GPUs have been adopted in many scientific and engineering applications to enhance computing performance. Currently, deep learning [6], [7], [11], [12], [14], [19], [31] is inseparable from the acceleration of GPU. For example, both convolution and pooling are suitable for parallel acceleration. Deep learning promotes the development of GPU, and GPU parallel acceleration enhances the update speed of deep learning [7], [11], [14], [19].

However, most existing image-processing applications executed on GPUs are based on coarse-grained schemes. In coarse-grained parallel applications, each thread completes an integrated mandate and there is no communication between threads, which can significantly reduce the programming complexity. However, only calculations that were originally independent can be achieved in parallel because of the limitations on communication. Besides, this model restricts the assignments of each thread, simply repeating independent tasks in parallel without dismantling the core algorithm. Cheng et al. proposed a parallel NEDI based on CUDA, and obtained ideal performance [5]. Kraus et al. proposed a GPU-based edge directed interpolation for an adaptive image magnification method [18]. Kui-Ying proposed a median based parallel steering kernel regression for interpolation on GPUs [20]. Wu et al. proposed a coarse-grained NEDI scheme that achieved a 61.7-times speedup [32]. Song et al. adopted that regression model to intra-field de-interlacing on a GPU [33].

In our work, we optimize coarse-grained new edge-directed interpolation (NEDI) and achieve obvious progress. However, after a sequence of optimization, we reach a bottleneck and cannot achieve further increases in speed. To overcome this obstacle, we propose a fine-grained NEDI. In our fine-grained NEDI, most independent calculations that were previously assigned to one thread are now achieved using multithreads. To achieve this goal, we consider many factors, particularly correspondence between the coordinates of pixels and the thread index. Considering the specific circumstances of the coarse-grained scheme, we assign the calculations for unknown pixel to 2 × 2, 2 × 4, and 4 × 4 threads in the fine-grained version. To demonstrate the effectiveness of the fine-grained scheme, we compare our coarse-grained and fine-grained schemes by interpolating a video from 720p to 1440p. Real-time interpolations can be performed and displayed on computers because of the fast performance of the fine-grained NEDI.

The rest of the paper is organized as follows. Section 2 gives a brief introduction of the NEDI. Section 3 describes the GPU/CUDA parallel implementation of the edge-directed interpolation scheme. Section 4 compares the coarse- and fine-grained parallel NEDI schemes for 720p video. The parallel scheme includes the implementation procedure and the optimization tricks used in our experiment. We present our conclusions in Section 5.

Section snippets

Edge-directed image interpolation

Edges have important effects on the visual quality of images. Traditional interpolations algorithms cannot achieve satisfactory edge performance. Two of the most common types of degradation in traditional image interpolation algorithms are blurred edges and artifacts. In recent years, large numbers of edge-directed interpolation (EDI) methods have been proposed. Some have sought accurate models by matching the local geometric properties of the image with predefined templates to estimate the

Parameters of the target CUDA platform

Our target GPUs are an NVIDIA GTX480 and a TESLA K40C. The GTX480 belongs to the GTX400 series of the NVIDIA GPU. It contains 15 stream multiprocessors (SMs, Fermi Architecture), each with 32 stream processors (SPs). Thus, there are 480 compute cores in a single GPU of a GTX480. The TESLA K40C consists of 15 SMXs (Kepler Architecture), each with 192 CUDA core. Thus, a TESLA K40C has 2880 compute cores. Many copies of the codes can thus be simultaneously executed on the available SMXs. The

Comparison of experimental results from coarse- and fine-grained NEDI for 720P videos

To test the performance for processing video, we experimented with a 720p video in our experiment (Fig. 17). As shown in Fig. 18, the fine-grained version achieved better speedup performance than the coarse-grained version. Due to the high parallelism in the fine-grained scheme, it always performs better than the coarse-grained scheme. The advantage of using GPUs is processing large amounts of data in parallel. However, the hardware resources of GPUs are limited. Therefore, speedup increases

Conclusions

NEDI demonstrates high objective performance (PSNR) and generates images with high visual quality. This approach takes edge orientation into account to reduce the interpolated artifacts that afflict conventional bilinear and bi-cubic interpolation algorithms. To accelerate the speed of this algorithm, we achieved a parallel NEDI using GPUs in the coarse-grained mode and proposed a fine-grained scheme. In the fine-grained NEDI, we overcome the bottleneck in the coarse-grained NEDI by converting

Acknowledgment

This work is supported by National Natural Science Foundation of China (No. 61377011), NRF of Korea (2015R1D1A1A01058171), and Scientific Research Fund of Hunan Provincial Education Department (Grant No. 12A054).

References (35)

  • C. Ding and D. Tao, "Trunk-branch ensemble convolutional neural networks for video-based face recognition," arXiv...
  • C. Ding et al.

    A comprehensive survey on pose-invariant face recognition

    ACM Trans. Intell. Syst. Technol. (TIST)

    (2016)
  • R. Fattal

    Image upsampling via imposed edge statistics

    ACM Trans. Graphics

    (2007)
  • G. Freedman et al.

    Image and video upscaling from local self-examples

    ACM Trans. Graphics

    (2011)
  • A. Giachetti et al.

    Real-time artifact-free image upscaling

    Image Process. IEEE Trans.

    (2011)
  • R. Girshick

    Fast r-cnn

  • K. He et al.

    Deep residual learning for image recognition

  • Cited by (4)

    • Local patch encoding-based method for single image super-resolution

      2018, Information Sciences
      Citation Excerpt :

      Unfortunately, these methods often produce unnatural artifacts, such as blurring, ringing, and jagged edges. Thus, many interpolation-based methods have been proposed to suppress unnatural artifacts by means of edge prior knowledge [39], different interpolating grids [35,50], edge sharpening processes [9,20], etc. These improved methods are able to refine the sharpness of edges but cannot recover high-frequency details.

    • Image Resolution Enhancement Using Improved Edge Directed Interpolation Algorithm

      2019, Proceedings - 9th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2019
    View full text