From coarse- to fine-grained implementation of edge-directed interpolation using a GPU
Introduction
Image interpolation is one of the most important areas of research in image processing. Images with high resolution and fine edge details are necessary for many visual systems and image processing tools [35]. To compensate for the limitations of hardware technology, image interpolation techniques are important for improving the visual quality of remote sensing images and natural images [28]. Such techniques involve estimating the values of unknown pixels through accurate computations and generating high-resolution images from low-resolution images [13]. In recent years, many image interpolation algorithms have been proposed. Most are based on the assumption that the values of unknown pixels can be obtained through linear combinations of the values of known pixels [2]. Nearest neighbor, bi-linear and bi-cubic interpolation all use that method. In particular, bi-cubic methods are widely adopted in a variety of image viewer and image processing tools. For smooth image area, even the nearest neighbor interpolation is very effective as it is simple and efficient. But for the edge area of the image, the pixel value changes greatly, it is difficult to estimate the unknown pixel. As the edges of the image are numerous, those methods [30], [34] are unable to achieve satisfactory edge performance or generate fine details. To improve edge continuity, many edge-directed interpolation algorithms have been proposed [1], [21], [24], [25], [30]. Their basic idea is to detect image edges and interpolate to enhance visual performance. So far, the most promising of these is the well-known new edge-directed interpolation (NEDI) algorithm proposed by Li and Orchard [21]. NEDI is based on the following theory: 1. Covariance-based adaptation can tune the prediction kernel support to match an arbitrarily oriented edge; 2. The high-resolution (HR) image has local covariance similar to that of the corresponding low-resolution (LR) image [21]. Li's method first calculates the local covariance of the LR image, and then obtains the interpolation weights through a mass of matrix operations according to the type of edge. The effectiveness of covariance-based interpolation theory enables NEDI to dramatically improve the subjective visual quality over these linear interpolations. The NEDI algorithm is a classic interpolation algorithm from which many more advanced and complex algorithms [1], [3], [8], [9], [10], [15], [16], [24], [28] have been proposed. Fattal proposed a upsampling algorithm via imposed edge statistics [8]. Freedman et al. proposed a video upscaling method based on local self-examples [9]. These improved algorithms provide better results at the cost of much greater computational complexity. In practice, an interpolation algorithm such as NEDI is restricted because its operation cycle is too long. Parallel acceleration for NEDI can be generalized to other extended algorithms with the same theoretical basis to improve their computational efficiency and expand their scope. Because computations for different pixels are performed independently, the high-performance computing (HPC) ability of GPUs can be used to perform this algorithm in parallel for different pixels.
Recently, general-purpose computing based on GPUs has become highly developed; current GPUs have thousands of cores and abundant bandwidth. With the significant increase in the number of cores and memory, GPU computing power has greatly improved. In recent years, GPUs have been adopted in many scientific and engineering applications to enhance computing performance. Currently, deep learning [6], [7], [11], [12], [14], [19], [31] is inseparable from the acceleration of GPU. For example, both convolution and pooling are suitable for parallel acceleration. Deep learning promotes the development of GPU, and GPU parallel acceleration enhances the update speed of deep learning [7], [11], [14], [19].
However, most existing image-processing applications executed on GPUs are based on coarse-grained schemes. In coarse-grained parallel applications, each thread completes an integrated mandate and there is no communication between threads, which can significantly reduce the programming complexity. However, only calculations that were originally independent can be achieved in parallel because of the limitations on communication. Besides, this model restricts the assignments of each thread, simply repeating independent tasks in parallel without dismantling the core algorithm. Cheng et al. proposed a parallel NEDI based on CUDA, and obtained ideal performance [5]. Kraus et al. proposed a GPU-based edge directed interpolation for an adaptive image magnification method [18]. Kui-Ying proposed a median based parallel steering kernel regression for interpolation on GPUs [20]. Wu et al. proposed a coarse-grained NEDI scheme that achieved a 61.7-times speedup [32]. Song et al. adopted that regression model to intra-field de-interlacing on a GPU [33].
In our work, we optimize coarse-grained new edge-directed interpolation (NEDI) and achieve obvious progress. However, after a sequence of optimization, we reach a bottleneck and cannot achieve further increases in speed. To overcome this obstacle, we propose a fine-grained NEDI. In our fine-grained NEDI, most independent calculations that were previously assigned to one thread are now achieved using multithreads. To achieve this goal, we consider many factors, particularly correspondence between the coordinates of pixels and the thread index. Considering the specific circumstances of the coarse-grained scheme, we assign the calculations for unknown pixel to 2 × 2, 2 × 4, and 4 × 4 threads in the fine-grained version. To demonstrate the effectiveness of the fine-grained scheme, we compare our coarse-grained and fine-grained schemes by interpolating a video from 720p to 1440p. Real-time interpolations can be performed and displayed on computers because of the fast performance of the fine-grained NEDI.
The rest of the paper is organized as follows. Section 2 gives a brief introduction of the NEDI. Section 3 describes the GPU/CUDA parallel implementation of the edge-directed interpolation scheme. Section 4 compares the coarse- and fine-grained parallel NEDI schemes for 720p video. The parallel scheme includes the implementation procedure and the optimization tricks used in our experiment. We present our conclusions in Section 5.
Section snippets
Edge-directed image interpolation
Edges have important effects on the visual quality of images. Traditional interpolations algorithms cannot achieve satisfactory edge performance. Two of the most common types of degradation in traditional image interpolation algorithms are blurred edges and artifacts. In recent years, large numbers of edge-directed interpolation (EDI) methods have been proposed. Some have sought accurate models by matching the local geometric properties of the image with predefined templates to estimate the
Parameters of the target CUDA platform
Our target GPUs are an NVIDIA GTX480 and a TESLA K40C. The GTX480 belongs to the GTX400 series of the NVIDIA GPU. It contains 15 stream multiprocessors (SMs, Fermi Architecture), each with 32 stream processors (SPs). Thus, there are 480 compute cores in a single GPU of a GTX480. The TESLA K40C consists of 15 SMXs (Kepler Architecture), each with 192 CUDA core. Thus, a TESLA K40C has 2880 compute cores. Many copies of the codes can thus be simultaneously executed on the available SMXs. The
Comparison of experimental results from coarse- and fine-grained NEDI for 720P videos
To test the performance for processing video, we experimented with a 720p video in our experiment (Fig. 17). As shown in Fig. 18, the fine-grained version achieved better speedup performance than the coarse-grained version. Due to the high parallelism in the fine-grained scheme, it always performs better than the coarse-grained scheme. The advantage of using GPUs is processing large amounts of data in parallel. However, the hardware resources of GPUs are limited. Therefore, speedup increases
Conclusions
NEDI demonstrates high objective performance (PSNR) and generates images with high visual quality. This approach takes edge orientation into account to reduce the interpolated artifacts that afflict conventional bilinear and bi-cubic interpolation algorithms. To accelerate the speed of this algorithm, we achieved a parallel NEDI using GPUs in the coarse-grained mode and proposed a fine-grained scheme. In the fine-grained NEDI, we overcome the bottleneck in the coarse-grained NEDI by converting
Acknowledgment
This work is supported by National Natural Science Foundation of China (No. 61377011), NRF of Korea (2015R1D1A1A01058171), and Scientific Research Fund of Hunan Provincial Education Department (Grant No. 12A054).
References (35)
- et al.
Fine-grained parallelization of lattice QCD kernel routine on GPUs
J. Parallel Distrib. Comput.
(2008) - et al.
Designing of a type-2 fuzzy logic filter for improving edge-preserving restoration of interlaced-to-progressive conversion
Inf. Sci.
(2009) - et al.
Locally estimated heterogeneity property and its fuzzy filter application for scanning format conversion
Inf. Sci.
(2016) - et al.
A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA
Comput. Struct.
(2013) Image super-resolution survey
Image Vis. Comput.
(2006)- et al.
Accuracy improvements and artifacts removal in edge based image interpolation
- et al.
Exposing fine-grained parallelism in algebraic multigrid methods
SIAM J. Scientific Comput.
(2012) - et al.
Improvement of a nonlinear image interpolation method based on heat diffusion equation
- et al.
Exploring fine-grained task-based execution on multi-GPU systems
- et al.
CUDA-based directional image/video interpolation
A comprehensive survey on pose-invariant face recognition
ACM Trans. Intell. Syst. Technol. (TIST)
Image upsampling via imposed edge statistics
ACM Trans. Graphics
Image and video upscaling from local self-examples
ACM Trans. Graphics
Real-time artifact-free image upscaling
Image Process. IEEE Trans.
Fast r-cnn
Deep residual learning for image recognition
Cited by (4)
Local patch encoding-based method for single image super-resolution
2018, Information SciencesCitation Excerpt :Unfortunately, these methods often produce unnatural artifacts, such as blurring, ringing, and jagged edges. Thus, many interpolation-based methods have been proposed to suppress unnatural artifacts by means of edge prior knowledge [39], different interpolating grids [35,50], edge sharpening processes [9,20], etc. These improved methods are able to refine the sharpness of edges but cannot recover high-frequency details.
Quantum Bilinear Interpolation Algorithms Based on Geometric Centers
2023, ACM Transactions on Quantum ComputingImage Resolution Enhancement Using Improved Edge Directed Interpolation Algorithm
2019, Proceedings - 9th IEEE International Conference on Control System, Computing and Engineering, ICCSCE 2019