# Design and Implementation of JPEG Encoder IP Core

Chung-Jr Lian, Liang-Gee Chen, Hao-Chieh Chang, and Yung-Chi Chang

DSP/IC Design Lab., Department of Electrical Engineering National Taiwan University, Taipei 106, Taiwan, R.O.C. Tel:+886-2-2363-5251 ext.344, Fax:+886-2-2363-8247 E-mail:{cjlian, lgchen}@video.ee.ntu.edu.tw

Abstract — A complete, low cost baseline JPEG encoder soft IP and its chip implementation are presented in this paper. It features user-defined, run-time re-configurable quantization tables, highly modularized and fully pipelined architecture. A prototype, synthesized with COMPASS cell library, has been implemented in TSMC 0.6-µm single-poly, triple-metal process. It can run up to 40 MHz at 3.3V. This IP can be easily integrated into various application systems, such as scanner, PC camera and color FAX, etc.

# I. Introduction

IP design and reuse become an hot issue in the System-on-a-Chip (SOC) era. This paper describes an IP design and implementation of an image encoding hardware architecture based on the baseline JPEG standard[1]. JPEG is a popular international standard for continuous-tone still image coding. Such an image encoding hardware is required for more and more data-intensive image transmission applications, such as high-resolution color image scanners, digital cameras with million-pixel resolution and so on. In order to overcome the long transmission latency and huge amount storage, image compression is required for these high-resolution image-processing devices.

For applications such as PC camera, digital still camera, and scanners with compression capability embedded, a dedicated architecture with hardware-optimized configuration is a cost-effective solution. It consumes less chip area and power. This motivates the proposed IP design of JPEG Encoder. In section II, the proposed architecture is described. The chip implementation results are shown in section III. Finally, a conclusion is given.

#### II. ARCHITECTURE

Baseline JPEG system consists of three major parts, the 2-D Discrete Cosine Transform (DCT), the Quantizer and the Huffman Coder, as shown in Fig. 1. The detailed block diagram of our proposed architecture is in Fig. 2. A cost-effective 2-D DCT module is implemented based on the row-column decomposition architecture[2]. Zig-Zag scan module is designed through a special arrangement of memory read and write, only one 64x11 dual port memory is needed, while other design[10] requires a 128x11 dual port memory. Also, the latency in this module is reduced by half. Quantization is realized by utilizing table look-up method. The quantization steps stored in QT-memory address the 1/Q map table to obtain the reciprocal of the quantization steps. ZZ-scan module is put in front of the Quantization module such that the quantization steps specified by user are stored in Zig-Zag scan order in the QTmemory. Quantization steps, therefore, can be sent to header module to be inserted into bitstream directly. Variable Length Code (VLC) encoder comprises differential pulse code modulation (DPCM) module, run-length coding (RLC) module, Huffman table, symbolslicing module and bit-packing module. We have proposed an efficient and cost-effective parallel VLC encoder architecture in [3].



Fig. 1 Block diagram of JPEG encoder



Fig. 2 Block diagram of the proposed architecture

The proposed architecture can process the codeword and amplitude in one clock cycle. Header and X'FF' marker insertion module is designed to produce completely JFIF-compliant[4] bitstream. No extra processor is needed to handle the bitstream syntax. The whole architecture is fully pipelined. High input data rate, one sample per cycle, and high throughput rate can be achieved so as to meet the speed requirement for various applications. Due to the smooth data flow, control circuits are designed compactly as a counter-based logic with pause function to freeze all operations. Pipeline stage insertion is very simple by changing the control parameters in compile time.

#### III. IMPLEMENTATION

Various images, color and gray, with different sizes are used for functional verification. The compressed bitstreams can be decoded correctly by various image processing software. That shows the conformance of the generated bitstream. For silicon verification, this IP is synthesized with COMPASS cell library, and fabricated using TSMC 0.6µm 1P3M process. Multiplexers for testing are added to some important module I/Os to view the internal nodes, which increases the observability. To exhaustively test the Huffman table, an extra counter is embedded to be the inputs of the table look-up module in the testing mode, which increases the controllability. Fig. 3. shows the chip micrograph. The design is flattened-routed with only soft constraints about cell positions to group some modules. The chip is tested using IMS ATS200 test system. It can run up to 40MHz at 3.3V. The detailed gate counts and specification of this prototype is summarized in Table III. The measured experimental result under different voltages and frequencies is shown in Fig. 4. There is few detail and suitable comparison data available, especially commercial product. Also, due to different technology mapping and different timing constraint during synthesis, some data can not be compared directly. We try to list the reference data of [10] in Table II for comparison. It is shown that our design is a very competent design.

JPEG encoder has become a very important IP in the market. With active pixel sensor, ADC, and JPEG encoder on one chip, compression camera SOC is a feasible solution. Scanner with compression function inside can shorten the transmission time. We have proposed a memory efficient preprocessing circuit[5] for such a scanner. As for color FAX, compound document compression[6] is a future trend. In these applications, a power, area, and speed efficient embedded IP is the best solution for cost and fast time-to-market consideration. Pre-processing circuits such as color space transformation, and raster-to-block line buffer, etc. can be easily interfaced with this IP through signal handshaking.

# IV. CONCLUSIONS

A Soft IP design of a low cost, quantization re-configurable JPEG encoder architecture is described in this paper. Compared with other JPEG encoders[7-10], our design is more compact in chip size/cost. Moreover, fully pipelined architecture with a central controller simplifies the control of data flow. Thus, for various applications, an image compression system can be rapidly built based on the proposed JPEG core architecture.

# REFERENCES

- [1] ISO/IEC, International Standard DIS 10918, Digital Compression and Coding of Continuous-Tone Still Images.
- [2] Avanindra Madisetti and Alan N. Willson, Jr., "A 100 MHz 2-D 8x8 DCT/IDCT Processor for HDTV Applications," *IEEE Transactions on Circuits and Systems for Video Technology*, Vol. 5, No.2, April 1995.
- [3] Hao-Chieh Chang, Liang-Gee Chen, Yung-Chi Chang and Sheng-Chieh Huang, "A VLSI Architecture Design of VLC Encoder for High Data Rate Video/Image Coding," ISCAS99, June 1999.
- [4] Eric Hamilton, C-Cube Microsystems, JPEG File Interchange Format, Version 1.02, September 1, 1992
- [5] Chung-Jr Lian, Liang-Gee Chen, Hao-Chieh Chang, and Yung-Chi Chang, "Embedded JPEG Encoder IP Core and Memory Efficient Preprocessing Architecture for Scanner," accepted by APCCAS2000.
- [6] ITU-T Recommendation T.44 (1999), Mixed raster content (MRC).
- [7] Peter A. Ruetz, Po Tong, Daniel Luthi and Peng Hang, "A Video-Rate JPEG Chip Set," *Journal of VLSI Signal Processing*, No. 5, pp. 141-150, 1993.
- [8] Martin Bolton, Richard Boulton, John Martin, Samuel Ng and Steve Turner, "A Complete Single-Chip Implementation of the JPEG Image Compression Standard," IEEE 1991 Custom Integrated Circuits Conference.
- [9] Mario Kovac and N. Ranganathan, "JAGUAR: A Full Pipelined VLSI Architecture for JPEG Image Compression Standard," Proceedings of the IEEE, Vol. 83, No. 2, Feb. 1995
- [10] Integrated Silicon Systems Ltd, Databook: JPEG IP Core Solutions v2.2, April 1999. URL: http://www.iss-dsp.com

TABLE I SUMMARY OF GATE COUNTS OF OUR DESIGN

| Block   | Size        |
|---------|-------------|
| DCT     | 15 k gates  |
| Packer  | 4.6 k gates |
| Quant   | 4.2 k gates |
| Header  | 2.6 k gates |
| Marker  | 1.4 k gates |
| Huffman | 1.3 k gates |

| Control          | 0.5 k gates        |
|------------------|--------------------|
| Others           | 3.9 k gates        |
| Total            | 33.5 k gates       |
| Transpose Memory | 64x16, dual port   |
| QT Memory        | 128x8, single port |
| ZZ Memory        | 64x11, dual port   |
| HT Memory        | fixed table        |

TABLE II
REFERENCE DATA OF [10] FOR COMPARISON

| Block            | Size                  |
|------------------|-----------------------|
| DCT              | 25 k gates            |
| Parser           | 2.3 k gates           |
| CodCtrl          | 6.5 k gates           |
| HuffEnc          | 6.0 k gates           |
| Packer           | 2.5 k gates           |
| RLEnc            | 7.0 k gates           |
| Quant            | 4.1 k gates           |
| Total            | 53.4 k gates          |
| Transpose Memory | 64x16, dual port      |
| QT Memory        | 256x8, single port    |
| ZZ Memory        | 128x11, dual port     |
| HT Memory        | 2x512x11, single port |



Fig. 3 JPEG encoder chip micrograph

# TABLE III CHIP SPECIFICATION

| Cell Library      | COMPASS 0.6µm         |
|-------------------|-----------------------|
| Technology        | TSMC 0.6µm 1P3M CMOS  |
| Core Area         | 5.38 mm x 5.35 mm     |
| Gate Count        | 33,120 (RAM excluded) |
| Transistor Count  | 170,190               |
| Clock Frequency   | 40 MHz                |
| Power Dissipation | 310mW @ 40 MHz, 3.3V  |
| Package           | 144 -Pin CQFP         |



Fig. 4 Shmoo Plot of Measured Experiment Results