Skip to main content
Log in

A hierarchical pipelining architecture and FPGA implementation for lifting-based 2-D DWT

  • Special Issue
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Numerous VLSI architectures for 2-D discrete wavelet transform (DWT) have been brought forward. While most of the designs displayed good performance through parallel processing, few of them addressed thoroughly how to sustain such high throughput computing which is crucial in real-time applications. Although the affordable data transfer bandwidth has been increased tremendously during the past decade, the pressure on data communication has not yet been relieved from stream-intensive applications. The design of 2-D DWT belongs to such cases. In this paper, we expose the performance gap between the computing core and the entire system, distinguishing them by quantitative approach with metrics of peak performance and mean-time performance. In order to narrow down the discrepancy without degrading either of the two criteria, on the one hand, we introduce a software-pipelining lifting-based computing kernel to remove data dependence for peak performance, on the other hand, we apply loop fusing technique and a hierarchical pipelining method to enhance data locality and boost the mean-time performance. The architecture has been implemented in Xilinx Virtex-II FPGA, taking advantage of Virtex-II’s embedded multipliers and block RAMs. We use Daubechies (9, 7) and LeGall (5, 3) filters (the default lossy and lossless filters in JPEG2000) for illustration whereas it is a general method for other DWT filters. The post-place and routing operation frequency for Daubechies (9, 7) is 138 MHz. Notably, the mean-time performance parameterized by image size and decomposition level achieves closely to peak performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Most designs process limited number of lines (N l ) simultaneously under area and power constraints despite their independence. With small filter tap number, N f , all the extra storage needed by convolution is N l  × N f , remarkably smaller than the whole image (typical sizes around hundreds or even thousands).

  2. As the default lossy filter in JPEG2000, the Daubechies (9, 7) is widely used thereby more related works are available for compassion. Furthermore, the Daubechies filter is more complicated than the LeGall filter. Therefore, we think it more representative in evaluating a related design.

References

  1. JPEG2000 image coding system, ISO/IEC International Standard 15444-1. ITU Recommendation T.800, 2000

  2. CS6210 discrete wavelet transform. Amphion, http://www.amphion.com/cs6210.html

  3. LB_2DFDWT: line-based programmable forward DWT. Cast Inc., http://www.xilinx.com/products/logicore/alliance/cast/ cast_lb_2dfdwt.pdf

  4. RC_2DDWT: combine 2D forward/inverse discrete wavelet transform. Cast Inc., http://www.xilinx.com/products/logicore/alliance/cast/ cast_rc_2ddwt.pdf

  5. Andra, K., Chakrabarti, C., Acharya, T.: A VLSI architecture for lifting-based forward and inverse wavelet transform. IEEE Trans. Signal Process. 50(4), 966–977 (2002)

    Article  Google Scholar 

  6. Chen, C-Y., Yang, Z-L., Wang, T-C., Chen, L-G.: A programmable parallel VLSI architecture for 2-D discrete wavelet transform. J. VLSI Signal Process. 28, 151–163 (2001)

    Article  MATH  Google Scholar 

  7. Chesney, D.R., Cheng, B.H.: Generalising the unimodular approach. In: Proceedings of ICPADS’94, pp. 398–404 (1994)

  8. Chrysafis, C., Ortega, A.: Line based, reduced memory, wavelet image compression. IEEE Trans. Image Process. 9, 378–389 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  9. Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting schemes. J. Fourier Anal. Appl. 4, 247–269 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dillen, G., Georis, B., Legat, J-D., Cantineau, O.: Combined line-based architecture for the 5-3 and 9-7 wavelet transform of JPEG2000. IEEE Trans. Circuits Syst. Video Technol. 13(9), 944–950 (2003)

    Article  Google Scholar 

  11. García, A., Ramírez, J., Meyer-Bäse, U., Castillo, E., Lloris-Ruíz, A.: Efficient embedded FPL resource usage for MS-based polyphase DWT filter banks. In: Proceedings of FPL 2005, pp. 531–534 (2005)

  12. Jiang, W., Ortega, A.: Lifting factorization-based discrete wavelet transform architecture design. IEEE Trans. Circuits Syst. Video Technol. 11(5), 651–657 (2001)

    Article  Google Scholar 

  13. Mallat, S.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)

    Article  MATH  Google Scholar 

  14. Ravasi, M., Tenze, L., Mattavelli, M.: A scalable and programmable architecture for 2-D DWT decoding. IEEE Trans. Circuits Syst. Video Technol. 12(8), 671–677 (2002)

    Article  Google Scholar 

  15. Twelves S, Wu M, White A (2001) JPEG2000 wavelet transform using starcore, an2089/d rev. 1 October 2001

  16. Zhang, C., Long, Y., Kurdahi, F.: A scalable embedded JPEG2000 architecture. J. Syst. Arch. 53(8), 524–538 (2007)

    Article  Google Scholar 

  17. Zhang, C., Long, Y., Oum, S.Y., Kurdahi, F.: Software-pipelined 2-D discrete wavelet transform with VLSI hierarchical implementation. In: Proceedings of RISSP’03, pp. 148–153 (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhui Zhang.

Appendix

Appendix

Software-pipelining Daubechies (9, 7) 1-D inverse filtering for n: \(\lceil\frac{i_0}{2} \rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)

$$ \begin{aligned} {\rm Tmp}_1(2n) &= K\cdot Y_{\rm ext}(2n) \\{\rm Tmp}_1(2n+1) &= (1/K)\cdot Y_{\rm ext}(2n+1) \\ {\rm Tmp}_0(2n-2) &= {\rm Tmp}_1(2n-2) - \delta \times \left[{\rm Tmp}_1(2n-3) + {\rm Tmp}_1(2n-1)\right] \\ {\rm Tmp}_0(2n-5) &= {\rm Tmp}_1(2n-5) - \gamma \times \left[{\rm Tmp}_1(2n-6) + {\rm Tmp}_1(2n-4)\right] \\ X(2n-8) &= {\rm Tmp}_0(2n-8) - \beta \times \left[{\rm Tmp}_0(2n-9) + {\rm Tmp}_0(2n-7)\right] \\ X(2n-11) &= {\rm Tmp}_0(2n-11) - \alpha \times \left[X(2n-12) + X(2n-10)\right] \\ \end{aligned} $$

Software-pipelining LeGall (5, 3) 1D Forward Filtering For n:\(\lceil \frac{i_0}{2}\rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)

$$ \begin{aligned} Y(2n+1)&= X_{\rm ext}(2n+1) - \left \lfloor {\frac{X_{\rm ext}(2n)+X_{\rm ext}(2n+2)} {2}} \right \rfloor \\ Y(2n-2)&= X_{\rm ext}(2n-2) - \left \lfloor {\frac{Y_{\rm ext}(2n-3)+Y_{\rm ext}(2n-1)+2} {4}}\right \rfloor \\ \end{aligned} $$

Software-pipelining LeGall (5, 3) 1D Inverse Filtering For n:\(\lceil \frac{i_0}{2}\rceil - 2 \leq n < \lceil \frac{i_1} {2}\rceil + 1\)

$$ \begin{aligned} X(2n)&= Y_{\rm ext}(2n) - \left \lfloor {\frac{Y_{\rm ext}(2n-1)+Y_{\rm ext}(2n+1)+2} {4}} \right \rfloor \\ X(2n-3)&= Y_{\rm ext}(2n-3) - \left \lfloor {\frac{X(2n-4)+X_{\rm ext}(2n-2)} {2}} \right \rfloor \\ \end{aligned} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Long, Y. & Kurdahi, F. A hierarchical pipelining architecture and FPGA implementation for lifting-based 2-D DWT. J Real-Time Image Proc 2, 281–291 (2007). https://doi.org/10.1007/s11554-007-0057-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-007-0057-6

Keywords

Navigation