Embedded block coding in JPEG 2000

https://doi.org/10.1016/S0923-5965(01)00028-5Get rights and content

Abstract

This paper describes the embedded block coding algorithm at the heart of the JPEG 2000 image compression standard. The paper discusses key considerations which led to the development and adoption of this algorithm, and also investigates performance and complexity issues. The JPEG 2000 coding system achieves excellent compression performance, somewhat higher (and, in some cases, substantially higher) than that of SPIHT with arithmetic coding, a popular benchmark for comparison The algorithm utilizes the same low complexity binary arithmetic coding engine as JBIG2. Together with careful design of the bit-plane coding primitives, this enables comparable execution speed to that observed with the simpler variant of SPIHT without arithmetic coding. The coder offers additional advantages including memory locality, spatial random access and ease of geometric manipulation.

Introduction

JPEG 2000 [2] is a new image compression standard, developed under the auspices of ISO/IEC JTCI/SC29/WG1 (commonly known as the JPEG committee). The standard departs radically from its better known predecessor, JPEG [3]. In place of the discrete cosine transform (DCT), JPEG 2000 employs a discrete wavelet transform (DWT). Whereas arithmetic coding and successive approximation are options in JPEG, they are central concepts in JPEG 2000. The coding mechanisms themselves are more efficient and support more flexible, finely embedded representations of the image. The JPEG 2000 algorithm also inherently supports good lossless compression, competitive compression of bi-level and low bit-depth imagery, and bit-streams which embed good lossy representations of the image within a lossless representation.

JPEG 2000 places a strong emphasis on scalability, to the extent that virtually all JPEG 2000 bit-streams are highly scalable. In order to support the needs of a wide variety of applications, different progression orders are defined. The scalability property, in its different forms, pertains to the ordering of information within the bit-stream. However, as discussed next, the coding process plays a key role. In general, dependencies introduced during this process can destroy one or more degrees of scalability. Thus, while the DWT provides a natural framework for scalable image compression, the coding methods described in this paper are key to the realization of the potential derived from this framework. Therefore, one goal of this Introduction is to precisely define the main notions of scalability involved, discussing their implication in the design of the coding scheme.

A resolution-scalable bit-stream is one from which a reduced resolution may be obtained simply by discarding unwanted portions of the compressed data. The lower resolution representation should be identical to that which would have been obtained if the lower resolution image were compressed directly. The DWT is an important tool in the construction of resolution-scalable bit-streams. As shown in Fig. 1, a first DWT stage decomposes the image into four subbands, denoted LL1, HL1 (horizontally high-pass), LH1 (vertically high-pass) and HH1. The next DWT stage decomposes this LL1 subband into four more subbands, denoted LL2, LH2, HL2 and HH2. The process continues for some number of stages, D, producing a total of 3D+1 subbands whose samples represent the original image. The total number of samples in all subbands is identical to that in the original image.

The DWT's multi-resolution properties arise from the fact that the LLd subband is a reasonable low resolution rendition of LLd−1, with half the width and height. Here, the original image is interpreted as an LL0 subband of highest resolution, while the lowest resolution is represented directly by the LLD subband. The LLd subband, 0⩽d<D, may be recovered from the subbands at levels d+1 through D by applying only Dd stages of DWT synthesis. So long as each subband from DWT stage d, 0<dD, is compressed without reference to information in any of the subbands from DWT stages d′, 0⩽d′<d, we may convert a compressed image into a lower resolution compressed image, simply by discarding those subbands which are not required. The number of resolutions available in this way is D+1.

A second type of scalability arises when the compressed bit-stream contains elements which can be discarded in order to obtain a lower quality (higher distortion) representation of the subband samples. We refer to this as distortion scalability. Ideally, the reduced quality representations obtained by discarding appropriate elements from a distortion scalable bit-stream can be decoded to reconstruct the original image with a fidelity approaching that of an “optimal” coder, tailored to produce the same bit-rate as the scaled bit-stream.

Most practical means of achieving this goal involve some form of bit-plane coding, whereby the magnitude bits of the subband samples are coded one by one from most significant to least significant. Discarding least significant bits is equivalent to coarser quantization of the original subband samples. The terms “SNR scalability”, “successive approximation” and “bit-rate scalability” have also been used in connection with this type of scalability.

Although resolution scalability (the ability to discard high frequency subbands) provides a crude mechanism for decreasing the bit-rate and increasing distortion, this is not usually an efficient mechanism for trading distortion for compressed size. It has been observed that discarding subbands from a compressed bit-stream generally produces lower resolution images with such small distortion (and large bit-rate) as to be inappropriate for applications requiring significant compression. In order to produce a family of successively lower image resolutions with a consistent level of perceived or objective distortion (e.g., a consistent mean squared error), the multi-resolution transform should be combined with distortion scalable coding.

Unfortunately, due to possible dependencies introduced during the coding process, the combination of a wavelet transform with bit-plane coding does not guarantee bit-streams that are both resolution-scalable and distortion-scalable. Furthermore, the order in which information appears within the compressed bit-stream can have a substantial impact on the resources required to compress or decompress a large image. The zero-tree coding structure [14] provides us with a useful example of the adverse consequences of excessive interaction between coding and ordering. Shapiro's original EZW algorithm [14] and Said and Pearlman's significantly enhanced SPIHT algorithm [12] provide excellent examples of embedded image compression. These algorithms have rightly received tremendous attention in the image compression community. However, the coding dependencies introduced by these algorithms dictate a distortion-progressive ordering of the compressed bits, as zero-trees involve downward dependencies between the subbands produced by successive DWT stages. These dependencies interfere with resolution scalability: no subset of the embedded bit-stream corresponds to the result of compressing a lower resolution image. Moreover, the encoder and decoder typically require a random access buffer, with storage for every subband sample in the image. Once compressed in this manner, the bit-stream cannot be reordered so as to support decompressors with reduced memory resources.

The JPEG standard also involves coding dependencies which prohibit some useful orderings. In its hierarchical refinement mode, multi-resolution image hierarchies are represented using a Laplacian pyramid structure which requires lower resolutions to be fully decoded before meaningful decoding of a higher resolution image can take place. This representation interferes with the distortion scalability offered by JPEG's successive approximation mode, since it is not possible to decompress a subset of the bit-planes across all resolution levels. This problem is dual to the one observed for zero-tree coding. In JPEG's progressive modes, any scalable bit-stream necessarily involves multiple scans through the entire image. Moreover, these progressive scans use different coding techniques to those specified by the sequential mode. As a result, they cannot generally be collapsed back into a sequential representation without transcoding the compressed bit-stream.

The arguments advanced above suggest that one should endeavour to decouple the process of efficiently coding subband samples from the ordering of the compressed bit-stream. The separation of information coding and information ordering is indeed a key consideration in the design of the JPEG 2000 algorithm. As a result, and in contrast to the above examples, the JPEG 2000 standard supports spatially progressive organizations which allow decompressors to work through the image from top to bottom. Information may progress in order of increasing resolution, in order of increasing quality across all resolutions, or in sequential fashion across all resolutions and qualities. The progression order is independent of the coding techniques and may be adjusted at will, without recourse to transcoding. JPEG 2000 also allows resource constrained decompressors to recover a reduced resolution version of an image which may be too large to decompress in its entirety.

Of course, it is not possible to completely decouple the coding and ordering of information, since efficient coding necessarily introduces dependencies. Sources of such dependencies include the use of conditional coding contexts, indivisible codes (e.g., vector, run-length or quad-tree codes) and adaptive probability models. There is also a limit to the granularity at which we can afford to label individual elements of the compressed bit-stream for subsequent reordering.

A natural compromise is to partition the subband samples into small blocks and to code each block independently. The various dependencies described above may exist within a block but not between different blocks. The size of the blocks determines the degree to which one is prepared to sacrifice coding efficiency in exchange for flexibility in the ordering of information within the final compressed bit-stream. This block coding paradigm is adopted by JPEG 2000, based on the concept of Embedded Block Coding with Optimal Truncation (EBCOT) [17]. Each block generates independent bit-streams which are packed into quality layers. In order to generate the quality layers, the independent bit-streams are in turn subdivided into a large number of “chunks”. While preserving the ordering of chunks within a block, the compressor is free to interleave chunks from the various blocks in any desired fashion, thus assigning incremental contributions from each block to each quality layer. The independent bit-streams can be truncated at the end-points of these chunks, which are referred to as truncation points.

The selection of truncation points raises, again, an ordering problem, since it affects the rate-distortion properties of the overall image representation. In a bit-plane coding scheme, bit-plane end-points are natural truncation points for the embedded bit-stream. However, the availability of a finer embedding, with many more useful truncation points, is a key element in the success of the EBCOT paradigm. To achieve a finer embedding, the sequence in which bits from different samples are coded is data dependent. This sequence tends to encode the most valuable information (in the sense of reducing the distortion of the reconstructed image the most) as early as possible. The embedded block coder uses context modelling to address both the ordering and the arithmetic coding of the events. The concept of adaptive ordering through context modeling was introduced independently and in somewhat different forms in [5], [8]. It is also closely related to the coding sequence employed in the SPIHT [12] and, to a lesser extend, EZW [14] algorithms. Rather than pursuing a totally adaptive approach, as in [5], JPEG 2000 imposes reasonable assumptions on the data, defining context-dependent “fractional bit-planes”, in the spirit of [8].

Thus, the block coding concept in JPEG 2000 and the embedded coder itself draw heavily from the EBCOT algorithm [17], which itself builds upon the contributions of other works; however, there are some notable differences as well as a number of mode variations which can have significant practical implications. In this paper, our goal is to provide the reader with an appreciation for the salient features of the algorithm, as well as some of the considerations which have contributed to its development.

The rest of this paper is organized as follows. In Section 2, we discuss the EBCOT paradigm and its advantages. In Section 3, we present the primitive coding operations which form the foundation of the embedded block coding strategy. In Section 4, we introduce the concept of fractional bit-planes, and discuss the principles behind it. In Section 5, we provide some indication of the performance of the algorithm, while in Section 6 we discuss its complexity, both for software and hardware implementations. Finally, in Section 7, we present variations on the algorithm, which are supported by Part 1 of the standard.

Section snippets

Independent code-blocks

Within the EBCOT paradigm adopted by JPEG 2000, each subband is partitioned into relatively small blocks (e.g., 64×64 or 32×32 samples) which we call “code-blocks”. This is illustrated in Fig. 2. Each code-block, Bi, is coded independently, producing an elementary embedded bit-stream, ci. It is convenient to restrict our attention to a finite number of allowable truncation points, Zi+1, for code-block Bi, having lengths, Li(z), with

0=Li0⩽Li1⩽⋯⩽LiZi.In the present development we are not

Bit-plane coding

The coding of code-blocks in JPEG 2000 proceeds by bit-planes. Bit-plane coding naturally arises in the framework of embedded quantization, as discussed in Section 3.1. In Section 3.2, we show how coding proceeds in order to derive an embedded bit-stream, and we discuss the importance of data dependent ordering strategies for achieving a fine embedding of the information. The remainder of the section is devoted to a detailed study of the primitive context modeling and coding operations which

Fractional bit-plane scan

In this section, we specify the order in which samples are visited when a given bit-plane is scanned. As discussed in Section 3.2, this order is data dependent, and is aimed at improving the embedding of the code-stream. This goal is achieved through multiple coding passes. For each bit-plane, p, the coding proceeds in a number of distinct passes, which we identify as “fractional bit-planes”: P1p, P2p and P3p. Each coding pass involves a scan through the code-block samples in stripes of height

Compression performance

In this section we provide some indication of the performance of the JPEG 2000 coder by comparing it with the SPIHT algorithm [12], which has become a popular benchmark for image compression. A 5 level DWT with the Cohen–Daubechies–Feauveau 9/7 biorthogonal wavelet kernels [1] is used for these experiments. Since both algorithms employ exactly the same wavelet transform and exactly the same quantization strategy, PSNR5

Complexity considerations

Our purpose in this section is to suggest that the JPEG 2000 coder is able to meet the demands of high performance applications. We begin with a brief discussion of the MQ arithmetic coder. Contrary to popular belief, arithmetic coding need not be a highly complex operation. We then provide evidence from our experience in working with software implementations of the standard. Finally, 6.3 Hardware implementation, 6.4 Buffering resources provide some useful statistics which may be used to

Mode variations

The JPEG 2000 standard allows for a number of variations on the algorithm described thus far. The mode variations are sufficiently minor that the need to support all modes may not impose a significant burden on decoder implementations.

Summary

JPEG 2000 is an advanced image compression standard which incorporates and emphasizes many features not found in earlier compression standards. Many of these features are imparted by independent, embedded coding of individual blocks of subband samples. In this paper, we have described the embedded block coding algorithm, its various advantages, indicative compression performance and some of its implications for implementation complexity.

There is no doubt that the JPEG 2000 standard is

References (23)

  • A. Cohen et al.

    Biorthogonal bases of compactly supported wavelets

    Comm. Pure Appl. Math.

    (1992)
  • ISO/IEC 15444-1: JPEG 2000 image coding system,...
  • ISO/IEC 10918-1, ITU Recommendation T.81: Digital compression and coding of continuous tone still images, requirements...
  • J. Li, P. Cheng, C. -C.J. Kero, On the improvements of embedded zerotree wavelet (EZW) coding, in: Proceedings of the...
  • J. Li, S. Lei, Rate-distortion optimized embedding, in: Proceedings of the Picture Coding Symposium, Berlin, 1997, pp....
  • M. Marcellin, T. Flohr, A. Bilgin, D. Taubman, E. Ordentlich, M. Weinberger, G. Seroussi, C. Chrysalis, T. Fisher, B....
  • E. Ordentlich, D. Taubman, M. Weinberger, G. Seroussi, M. Marcellin, Memory efficient scalable line-based image coding,...
  • E. Ordentlich, M. Weinberger, G. Seroussi, A low-complexity modeling approach for embedded coding of wavelet...
  • E. Ordentlich, M. Weinberger, G. Seroussi, On modeling and ordering for embedded image coding, in: Proceedings of the...
  • W. Pennebaker et al.

    JPEGStill Image Data Compression Standard

    (1992)
  • W. Pennebaker et al.

    An overview of the basic principles of the q-coder adaptive binary arithmetic coder

    IBM J. Res. Develop.

    (1988)
  • Cited by (98)

    • Intelligent Image and Video Compression: Communicating Pictures

      2021, Intelligent Image and Video Compression: Communicating Pictures
    • Efficient and simple scalable image compression algorithms

      2019, Ain Shams Engineering Journal
      Citation Excerpt :

      That is, the distortion associated with the bit-rate is not minimal. So, in order to produce lower image resolutions with a consistent level of distortion, resolution scalability should be combined with rate scalability to produce what is called a highly scalable bit-stream that is both rate and resolution scalable [1,5]. Taubman presented the Embedded Block Coding with Optimal Truncation (EBCOT) algorithm [6].

    View all citing articles on Scopus
    View full text