Elsevier

Parallel Computing

Volume 31, Issues 10–12, October–December 2005, Pages 1082-1098
Parallel Computing

High performance JPEG 2000 and MPEG-4 VTC on SMPs using OpenMP

https://doi.org/10.1016/j.parco.2005.03.013Get rights and content

Abstract

JPEG 2000 and MPEG-4 Visual Texture Coding (VTC) are both wavelet-based and state of the art in still image coding. In this paper we show sequential as well as parallel strategies for speeding up two selected implementations of MPEG-4 VTC and JPEG 2000 using the popular shared memory programming paradigm OpenMP. Furthermore, we discuss the sequential and parallel performance of the improved versions and compare the efficiency of both algorithms.

Introduction

Image and video coding methods that use wavelet transforms have been successful in providing high rates of compression while maintaining good image quality and have generated much interest in the scientific community as competitors to DCT-based compression schemes. With the finalization of the wavelet-based JPEG 2000 standard [1] and the inclusion of a wavelet algorithm for synthetic/natural hybrid coding in MPEG-4 (MPEG-4 VTC) [2], [3] there is no doubt that wavelet image compression has to be considered state of the art nowadays.

In this work we show how we can improve the runtime performance of MPEG-4 VTC and JPEG 2000. First, we improve the wavelet decomposition via a reorganization of the order in which the data is processed in order to enhance the overall cache access. Second, we exploit parallelism within the two major coding stages of both algorithms to further speed up the execution, which are the wavelet-lifting and code-block processing part in JPEG 2000, and the convolution-based wavelet filtering and the zerotree coding in MPEG-4 VTC.

The reference software used in our experiments is the MPEG-4 MoMuSys (Mobile Multimedia Systems) Verification Model of August 1999 (ISO/IEC JTC1/SC29/WG11 N2805) and the Jasper JPEG 2000 reference implementation (by Michael D. Adams, available at http://www.ece.ubc.ca/mdadams), which are both written in C. We use OpenMP (http://www.openmp.org) to implement our parallel concept for the execution on shared-memory multiprocessors which are known to be interesting hardware platforms for image processing applications [4]. Parallel results are presented for two multiprocessor platforms: a SGI Power Challenge (20 IP25 RISC CPUs, running at 195 MHz) and a SGI Origin3800 (128 MIPS RISC R12000 CPUs, running at 400 MHz). Note that the following paragraphs focus on the two considered official JPEG 2000 reference implementations. Other software packages (like the JPEG 2000 VM 6.0 or the Kakadu software) already contain some of the proposed or similar techniques for cache behavior optimization.

Lucka and Sorevik [5] propose an OpenMP based parallelization of a first generation wavelet compression scheme. Message passing based parallelizations of second generation wavelet image coding systems (i.e. tree-based or EBCOT-like schemes) are discussed by Feil and Uhl [6], [7] and Kutil [8]. In this work we apply OpenMP based parallelization techniques to the second generation wavelet image coding systems JPEG 2000 and MPEG-4 VTC. In Section 2, we shortly review JPEG 2000 and MPEG-4 VTC and compare those standards from a compression and execution performance point of view. Section 3 discusses and resolves cache organization problems of the considered reference implementations. Parallelization strategies and corresponding experimental results are covered in Section 4. Section 5 concludes the paper.

Section snippets

Wavelet-based image compression standards

Here we present the basic features and techniques implemented in the two still image coding standards JPEG 2000 and MPEG-4 VTC. Firstly, both algorithms are discussed. Secondly, the coding performance of the two algorithms is compared and the results are related to the performance of the well-known JPEG image coding standard. The last part of this section gives a runtime analysis of JPEG 2000 and MPEG-4 VTC.

Cache issues

In Fig. 5a, the runtime of the first decomposition level of the MoMuSys MPEG-4 VTC DWT is subdivided into the vertical and horizontal filtering. We see a significant difference between the vertical and horizontal filtering performance, especially in the case of increasing image dimensions. The vertical filtering of large images is up to 5–8 times slower as compared to the horizontal filtering.

A very similar runtime gap can be observed in the JPEG 2000 reference implementation Jasper (see Fig. 5

Parallelization using OpenMP

OpenMP [19] (http://www.openmp.org) is an efficient tool for programming within parallel shared-memory environments. OpenMP can be seen as a programming interface generalizing the usage of threads, hiding the pure thread and its appliance, respectively the synchronization between threads under macroconstructs, so-called pragmas. These pragmas provide constructs for performing sections of a sequential program (e.g. loops) in parallel.

Conclusion

The runtime performance of the MoMuSys MPEG-4 VTC and the JPEG 2000 reference implementation Jasper is improved significantly by implementing aggregated vertical filtering. Depending on certain parameters, a parallel version can further speedup the execution of both algorithms to some extent. The aggregated version’s parallel efficiency is better, which is mainly due to cache and bus phenomena. However, the scalability is very limited due to the relatively large amount of inherently sequential

Acknowledgment

This work has been partially supported by the Austrian Science Fund (project FWF-13903).

References (26)

  • M. Feil et al.

    Motion-compensated wavelet packet zerotree video coding on multicomputers

    Journal of Systems Architecture

    (2003)
  • R. Kutil

    Approaches to zerotree image and video coding on MIMD architectures

    Parallel Computing

    (2002)
  • J. Lu

    Parallelizing Mallat algorithm for 2-D wavelet transforms

    Information Processing Letters

    (1993)
  • D. Taubman et al.

    JPEG2000—Image Compression Fundamentals, Standards and Practice

    (2002)
  • I. Sodagar et al.

    Scalable wavelet coding for synthetic/natural hybrid coding

    IEEE Transactions on Circuits and Systems for Video Technology

    (1999)
  • ISO/IEC 14496-2, Information technology—coding of audio-visual objects—Part 2: Visual, December...
  • C. Rothlübbers, R. Orglmeister, Parallel image processing using a Pentium based shared-memory multiprocessor system,...
  • M. Lucka, T. Sorevik, Parallel wavelet-based compression of two-dimensional data, in: A. Handlovicova, M. Komornikova,...
  • M. Feil et al.

    Wavelet packet zerotree image coding on multicomputers

  • W. Pennebaker et al.

    JPEG—Still Image Compression Standard

    (1993)
  • D. Taubman

    High performance scalable image compression with EBCOT

    IEEE Transactions on Image Processing

    (2000)
  • J.M. Shapiro

    Embedded image coding using zerotrees of wavelet coefficients

    IEEE Transactions on Signal Processing

    (1993)
  • A. Said et al.

    A new, fast, and efficient image codec based on set partitioning in hierarchical trees

    IEEE Transactions on Circuits and Systems for Video Technology

    (1996)
  • Cited by (0)

    View full text