Real-time DSP implementation of motion-JPEG2000 using overlapped block transferring and parallel-pass methods
Introduction
JPEG2000 compression standard has been created to provide high compression efficiency compared to JPEG [1]. It includes a rich set of features such as improved compression efficiency, lossy to lossless compression, multiple resolution representation, embedded bit-stream, region-of-interest (ROI) coding, and error resilience [2], [3], [4].
Motion-JPEG2000 (MJP2) is intended to create a new coding system required by video communication market and applications based on JPEG2000. The core technology of MJP2 targets an intra-based coding system, which differs from the current moving pictures standards, MPEG (MPEG-1, 2 and 4). It is well known that MPEG outperforms Motion-JPEG in compression efficiency because MPEG takes advantage of motion prediction between pictures. However, it is notable that MJP2 outperforms MPEG-2 and MPEG-4 in both the compression rate and error resiliency as presented in the recent study [5]. In particular, the advantage of MJP2 is outstanding in error prone environments. This is very important for consumer application as well as professional broadcasting systems.
DSP technology applied to various multimedia applications is also evolved fast. Recently, in the DSP technology, the single instruction multiple data (SIMD) instructions become usable [6]. As packing several small data types into a larger register, the SIMD instructions manipulate and process multiple data in an instruction, and thus reduce the execution time drastically.
In this paper, we present an embedded MJP2 system structure to encode video in real-time. The architecture primarily consists of three modules: the video acquisition module which obtains image data from two analog cameras, the MJP2 encoder module, and the local area network (LAN) module to transmit encoded codestreams via the Internet. For the MJP2 encoder, we propose the overlapped block transferring (OBT) method, based on the cache performance to improve DWT. Instead of the line-based lifting scheme. An image is divided into overlapped subblocks and then each overlapped subblock is processed by a 2-D lifting algorithm to increase the cache hit rate. We show that the OBT-based lifting scheme with the SIMD instructions and the super scalar pipeline structure of DSP can increase the performance of the DWT drastically. Moreover, we propose a parallel-pass method for fast implementation of EBCOT. This method reduces the processing time of EBCOT by processing the three coding passes of the same bit-plane in parallel.
The paper is organized as follows: The proposed system level architecture is presented in Section 2. The OBT-based lifting scheme with the SIMD instructions and parallel-Pass processing for EBCOT are proposed in Section 3. In Section 4, the performance of the proposed system is discussed and conclusions are given in Section 5.
Section snippets
The implemented MJP2 system architecture
Fig. 1 shows the proposed block diagram for a hardware implementation of MJP2 encoder. This system is under development with ALTERA MAX7256 (256 LEs) and TMS320C6416 (600 MHz, 4800 MIPS, 128 kb cache). The video acquisition module captures NTSC and RS-170 analog video. The analog video is digitized into YUV 4:2:2 formatted video with two separate fields. These two fields are merged into a frame by an FPGA in Fig. 1. The frame generated is fed to the MJP2 encoder module for compression. Since the
The proposed software architecture of MJP2
Among several modules in JPEG2000 encoder, the lifting algorithm for discrete wavelet transform (DWT) and the embedded block coding with optimized truncation (EBCOT) comprise more than 85% of the encoding complexity. Thus, it is very important to design and optimize these two modules in order to increase the performance. The latest DSP chip can enable the real-time implementation of the DWT and adaptive binary arithmetic coding [6]. Utilizing the hardware features of the DSP chip, we optimize
Experimental results
The proposed OBT-based lifting scheme with the SIMD instructions and the parallel-pass processing are demonstrated in this section.
Table 1 shows a comparison of execution time of 2-D DWT for several image sizes. As shown in Table 1, the lifting method using the proposed OBT memory management scheme reduces the execution time of the lifting algorithm significantly. Note that the execution time is more reduced with the increase of the image size. The lifting scheme using the proposed OBT in our
Conclusions
In this paper, we have presented a real-time embedded Motion-JPEG2000 encoding system using a fixed-point DSP chip. To improve the performance of the system, we have proposed OBT-based lifting scheme to increase the cache hit rate. The OBT-based lifting scheme is over five times faster than the line-based lifting scheme. Moreover, the usage of the SIMD instructions and the super scalar pipeline architecture of DSP has reduced the wavelet execution time by over three times. In addition, we
References (8)
- et al.
An overview of the JPEG2000 still image compression standard
Signal ProcessingImage Communication
(2002) - et al.
JPEG2000: Image compression fundamentals, standards and practice
(2002) - Information Technology—JPEG2000 Image coding system: Part 1. ISO/IEC International Standard 15444-1...
- Information Technology—JPEG2000 Image coding system: Part 5—Reference Software. ISO/IEC International Standard 15444-5...
Cited by (10)
An efficient design for Motion-JPEG2000 system in real-time video encoding
2008, Journal of Circuits, Systems and ComputersMemory-efficient hardware architecture of 2-D dual-mode lifting-based discrete wavelet transform
2013, IEEE Transactions on Circuits and Systems for Video TechnologyMemory-efficient architecture of 2-D lifting-based discrete wavelet transform
2011, Journal of the Chinese Institute of Engineers, Transactions of the Chinese Institute of Engineers,Series AReal-time Two-Stage SPECK (TSSP) design and implementation for scalable video coding on embedded systems
2011, 2011 IEEE Visual Communications and Image Processing, VCIP 2011The real time coding of JPEG2000 based on TMS320C6455
2010, ICCASM 2010 - 2010 International Conference on Computer Application and System Modeling, ProceedingsReal-time scalable video codec implementation for surveillance
2009, Proceedings - 2009 IEEE International Conference on Multimedia and Expo, ICME 2009