Abstract
Numerous VLSI architectures for 2-D discrete wavelet transform (DWT) have been brought forward. While most of the designs displayed good performance through parallel processing, few of them addressed thoroughly how to sustain such high throughput computing which is crucial in real-time applications. Although the affordable data transfer bandwidth has been increased tremendously during the past decade, the pressure on data communication has not yet been relieved from stream-intensive applications. The design of 2-D DWT belongs to such cases. In this paper, we expose the performance gap between the computing core and the entire system, distinguishing them by quantitative approach with metrics of peak performance and mean-time performance. In order to narrow down the discrepancy without degrading either of the two criteria, on the one hand, we introduce a software-pipelining lifting-based computing kernel to remove data dependence for peak performance, on the other hand, we apply loop fusing technique and a hierarchical pipelining method to enhance data locality and boost the mean-time performance. The architecture has been implemented in Xilinx Virtex-II FPGA, taking advantage of Virtex-II’s embedded multipliers and block RAMs. We use Daubechies (9, 7) and LeGall (5, 3) filters (the default lossy and lossless filters in JPEG2000) for illustration whereas it is a general method for other DWT filters. The post-place and routing operation frequency for Daubechies (9, 7) is 138 MHz. Notably, the mean-time performance parameterized by image size and decomposition level achieves closely to peak performance.








Similar content being viewed by others
Notes
Most designs process limited number of lines (N l ) simultaneously under area and power constraints despite their independence. With small filter tap number, N f , all the extra storage needed by convolution is N l × N f , remarkably smaller than the whole image (typical sizes around hundreds or even thousands).
As the default lossy filter in JPEG2000, the Daubechies (9, 7) is widely used thereby more related works are available for compassion. Furthermore, the Daubechies filter is more complicated than the LeGall filter. Therefore, we think it more representative in evaluating a related design.
References
JPEG2000 image coding system, ISO/IEC International Standard 15444-1. ITU Recommendation T.800, 2000
CS6210 discrete wavelet transform. Amphion, http://www.amphion.com/cs6210.html
LB_2DFDWT: line-based programmable forward DWT. Cast Inc., http://www.xilinx.com/products/logicore/alliance/cast/ cast_lb_2dfdwt.pdf
RC_2DDWT: combine 2D forward/inverse discrete wavelet transform. Cast Inc., http://www.xilinx.com/products/logicore/alliance/cast/ cast_rc_2ddwt.pdf
Andra, K., Chakrabarti, C., Acharya, T.: A VLSI architecture for lifting-based forward and inverse wavelet transform. IEEE Trans. Signal Process. 50(4), 966–977 (2002)
Chen, C-Y., Yang, Z-L., Wang, T-C., Chen, L-G.: A programmable parallel VLSI architecture for 2-D discrete wavelet transform. J. VLSI Signal Process. 28, 151–163 (2001)
Chesney, D.R., Cheng, B.H.: Generalising the unimodular approach. In: Proceedings of ICPADS’94, pp. 398–404 (1994)
Chrysafis, C., Ortega, A.: Line based, reduced memory, wavelet image compression. IEEE Trans. Image Process. 9, 378–389 (2000)
Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting schemes. J. Fourier Anal. Appl. 4, 247–269 (1998)
Dillen, G., Georis, B., Legat, J-D., Cantineau, O.: Combined line-based architecture for the 5-3 and 9-7 wavelet transform of JPEG2000. IEEE Trans. Circuits Syst. Video Technol. 13(9), 944–950 (2003)
García, A., Ramírez, J., Meyer-Bäse, U., Castillo, E., Lloris-Ruíz, A.: Efficient embedded FPL resource usage for MS-based polyphase DWT filter banks. In: Proceedings of FPL 2005, pp. 531–534 (2005)
Jiang, W., Ortega, A.: Lifting factorization-based discrete wavelet transform architecture design. IEEE Trans. Circuits Syst. Video Technol. 11(5), 651–657 (2001)
Mallat, S.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Ravasi, M., Tenze, L., Mattavelli, M.: A scalable and programmable architecture for 2-D DWT decoding. IEEE Trans. Circuits Syst. Video Technol. 12(8), 671–677 (2002)
Twelves S, Wu M, White A (2001) JPEG2000 wavelet transform using starcore, an2089/d rev. 1 October 2001
Zhang, C., Long, Y., Kurdahi, F.: A scalable embedded JPEG2000 architecture. J. Syst. Arch. 53(8), 524–538 (2007)
Zhang, C., Long, Y., Oum, S.Y., Kurdahi, F.: Software-pipelined 2-D discrete wavelet transform with VLSI hierarchical implementation. In: Proceedings of RISSP’03, pp. 148–153 (2003)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Software-pipelining Daubechies (9, 7) 1-D inverse filtering for n: \(\lceil\frac{i_0}{2} \rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)
Software-pipelining LeGall (5, 3) 1D Forward Filtering For n:\(\lceil \frac{i_0}{2}\rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)
Software-pipelining LeGall (5, 3) 1D Inverse Filtering For n:\(\lceil \frac{i_0}{2}\rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)
Rights and permissions
About this article
Cite this article
Zhang, C., Long, Y. & Kurdahi, F. A hierarchical pipelining architecture and FPGA implementation for lifting-based 2-D DWT. J Real-Time Image Proc 2, 281–291 (2007). https://doi.org/10.1007/s11554-007-0057-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-007-0057-6