Fast approximate DCT with GPU implementation for image compression

https://doi.org/10.1016/j.jvcir.2016.07.003Get rights and content

Highlights

  • A multiplierless efficient and low complexity 8-point approximate DCT is proposed.

  • A flow diagram is provided for the fast computation.

  • The proposed transform is suitable for fast high performance compression.

  • A GPU implementation is provided for further increase of compression speed.

Abstract

Recent developments in image and video processing employed in multimedia and communication systems require fast 2-D Discrete Cosine Transforms (DCT). The DCT is widely employed in image compression for its high power compaction property. Approximate DCT transforms have been developed to proceed faster than the original DCT while maintaining comparative levels of power compaction. This paper introduces a multiplierless efficient and low complexity 8-point approximate DCT. A flow diagram is provided for the fast implementation of the proposed transform. Only 17 additions are required for both forward and backward transformations. A fast and efficient Graphics Processing Unit (GPU) implementation for the proposed transform is provided. Performance evaluation shows that the proposed transform outperforms other approximate DCT transforms in JPEG-like image compression.

Introduction

Image and video compression are essential in many recent systems. These include digital video in multimedia devices [1], geospatial remote sensing [2], automatic surveillance [3], traffic cameras [4], and homeland security [5]. Image sizes and videos are increasing massively and the need to accelerate the compression standards is becoming more demanding. Transform methods using orthogonal kernel functions are commonly used in image compression. The Discrete Cosine Transform (DCT) [6] incorporates real sinusoids and possesses many interesting features. In addition to its orthogonal structure, the DCT has good power compaction properties. The DCT is the best substitute for the Karhunen Loeve transform, which is considered to be statistically optimal for power concentration [7]. For this reason, the DCT is the core of image coding [8] and video compression techniques such as JPEG, MPEG1, MPEG2, H.261, and H263 [9]. In spite of the existence of many fast algorithms [10] which reduce the total number of operations required to compute such transforms, multiplication operations may be inevitable. To increase the speed of transformation while keeping the compaction properties of the DCT, the Signed Discrete Cosine Transform (SDCT) [11] was suggested simply by applying the signum function operator to the DCT elements. All the elements of the transform are 0 or ±1, that is no multiplication operations or transcendental expressions are required. Unfortunately, the SDCT transformation matrix is not orthogonal. The computation of the SDCT requires 24 additions [11]. Following the introduction of the SDCT, a stream of research papers has followed such as the Bouguezel-Ahmed-Swamy (BAS) series of algorithms [12], [13], [14], [15], [16]. The target is to modify the SDCT to further reduce the computational complexity and to achieve orthogonality. The strategy is to change some of the SDCT matrix elements and to clear others. An interesting modification to the SDCT is introduced in [17] with a transform, T1T1=D12211111111110000-1-110.5-0.5-1-1-0.50.5100-1001001-1-111-1-111-100001-10.500-0.5-0.5000.5000-11000whereD1=diag1,2,22/5,2,1,2,2,22/5,2In T1, Some of the SDCT elements have been changed to ±0.5 and 24 elements have been cleared (turned to zero). It has been shown in [17] that the power compaction of the transform, and consequently the compression capabilities, is high. The computation of the transform requires 17 additions and two shifts. The transform in [17] is quasi orthogonal. There are two nonzero off-diagonal elements in T1T1t. However, the effect of these two nonzero elements on the compression is negligible and the approximation of the transform transpose as the transform inverse can be performed.

A prominent member of the BAS series is the parametric transform [16] which is an 8-point orthogonal transform containing a single parameter in its transformation matrix, T2 given byT2=D211111111110000-1-11α-α-1-1-αα100100-1001-1-111-1-110001-10001-100001-1α-11-α-α1-1αwhereD2=diag1/8,1/2,1/4+4α2,1/2,1/8,1/2,1/2,1/4+4α2The parameter, α is selected as a small integer in order to minimize the complexity of T2 [16].

A collection of approximations for the 8-point DCT based on integer functions is presented in [18]. Considered functions include: the floor, ceiling, truncation, and rounding-off functions. Such approximations are orthogonal or quasi-orthogonal with low arithmetic complexity and low complexity inversion. The approximations in [18] are multiplierless requiring only additions and bit-shifting operations. Computational complexity ranged from 18 to 24 additions. A prominent orthogonal member of this collection with no bit-shifting is based on the rounding to zero function and is given byT3=D31111111111100-1-1-111-1-1-1-11110-1-1110-11-1-11-1-1-111-101-101-11-11-1-11-110-11-11-110whereD3=diag1/8,1/6,1/2,1/6,1/8,1/6,1/2,1/6

Another interesting DCT approximation requiring only 14 additions is presented in [19]. It is based on a search over the 8 × 8 matrix space in order to find candidate matrices that possess low computation cost. The search is restricted to produce orthogonal matrices with elements in the range  {0, 1, 2}. The transforming matrix is given byT4=D411111111010000-10100-1-10011000000-11-1-111-1-110001-10001-11001-1001001-100whereD4=diag1/8,1/2,1/2,1/2,1/8,1/2,1/2,1/2

Another transform with 26 inserted zero elements has been introduced in [20]. The other elements in this transform are ±1. Due to the large number of inserted zeroes, the computational requirements are 17 additions only. Unfortunately, the power compaction of this transform is relatively low due to the random insertion of zeros. Moreover, the transform is not orthogonal and the transform transpose cannot substitute the transform inverse which offsets the advantage of the reduced computational requirements.

Alternate approximation to the DCT has been obtained by scaling then rounding-off the scaled DCT matrix [21], [22]. The resulting transform has ±1 elements and 16 zeroes. The transform is orthogonal and the computational requirements are 22 additions which is relatively high.

This paper is twofold. In the first part of the paper, the proposed algorithm is introduced in Section 2. Section 3 demonstrates the improved performance of the proposed transform in image compression. In the second part of the paper, an efficient GPU implementation for the proposed transform is illustrated in Section 4. Finally, Section 5 concludes the work.

Section snippets

Proposed approximate DCT transform

The aim of this research is to introduce an efficient approximate DCT transform attaining both high energy compaction properties and low computational complexity. Referring to the literature review summarized in the introduction of Section 1 before, the algorithm of [17] has high compaction properties and a comparative low complexity. However, there are shift operations that may cause problems. This is because, in hardware, a shift operation may be represented as a right-shift which may incur

Image compression using the proposed transform

As mentioned earlier, the DCT and its approximations have good power compaction properties. For a typical image, the two-dimensional such transforms concentrate most of the visually significant information about the image in just a few coefficients. The international standard lossy image compression algorithm produced by the Joint Photographic Experts Group (JPEG) employs these transforms [23]. Basically, in JPEG [11], the input image is divided into 8-by-8 (or 16-by-16) blocks, and the

Conclusion

The Discrete Cosine Transform (DCT) is the origin for fast transforms employed in image compression. Several approximations to the DCT algorithm have been investigated in the literature to speed up the implementation of the DCT. The strategy is to convert some of the DCT elements to zeroes and modify others. There is a tradeoff between the number of inserted zeroes which reduces the computational complexity and the power compaction capability of the resulting transform. This paper introduces an

References (35)

  • T.I. Haweel

    A new square wave transform based on the DCT

    Signal Process.

    (2001)
  • R.J. Cintra et al.

    Low-complexity 8-point DCT approximations based on integer functions

    Signal Process.

    (2014)
  • M. Rezaei et al.

    Video rate control for streaming and local recording optimized for mobile devices

  • E. Magli et al.

    Image compression practices and standards for geospatial information systems

  • H.-Y. Lin et al.

    High dynamic range imaging for stereoscopic scene representation

  • M. Bramberger et al.

    Real-time video analysis on an embedded smart camera for traffic surveillance

  • S. Marsi et al.

    Video enhancement and dynamic range control of HDR sequences for automotive applications

    Adv. Signal Process.

    (2007)
  • N. Ahemd et al.

    Discrete cosine transform

    IEEE Trans. Acoust. Speech Signal Process.

    (1984)
  • R.J. Clark

    Relation between Karhunen-Loeve and cosine transform

    Proc. Inst. Elec. Eng., Pt. F

    (1981)
  • W.M. Abdelhafez et al.

    Hybrid scheme for lifting based image coding

  • G. Lakhani

    Modifying JPEG binary arithmetic coder for exploiting inter/intra-block and DCT coefficient sign redundancies

    IEEE Trans. Image Process.

    (2013)
  • H.S. Hou

    A fast recursive algorithm for computing the discrete cosine transform

    IEEE Trans. Acoust. Speech Signal Process.

    (1987)
  • S. Bouguezel et al.

    A multiplication-free transform for image compression

  • S. Bouguezel et al.

    Low complexity 8 × 8 transform for image compression

    Electron. Lett.

    (2008)
  • S. Bouguezel et al.

    A fast 8 × 8 transform for image compression

  • S. Bouguezel et al.

    A novel transform for image compression

  • S. Bouguezel et al.

    A low complexity parametric transform for image compression

  • Cited by (16)

    View all citing articles on Scopus

    This paper has been recommended for acceptance by M.T. Sun.

    View full text