Skip to main content
Log in

Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this work, we analyze the behavior of several parallel algorithms developed to compute the two-dimensional discrete wavelet transform using both OpenMP over a multicore platform and CUDA over a GPU. The proposed parallel algorithms are based on both regular filter-bank convolution and lifting transform with small implementations changes focused on both the memory requirements reduction and the complexity reduction. We compare our implementations against sequential CPU algorithms and other recently proposed algorithms like the SMDWT algorithm over different CPUs and the Wippig&Klauer algorithm over a GTX280 GPU. Finally, we analyze their behavior when algorithms are adapted to each architecture. Significant execution times improvements are achieved on both multicore platforms and GPUs. Depending on the multicore platform used, we achieve speed-ups of 1.9 and 3.4 using two and four processes, respectively, when compared to the sequential CPU algorithm, or we obtain speed-ups of 7.1 and 8.9 using eight and ten processes. Regarding GPUs, the GPU convolution algorithm using the GPU shared memory obtains speed-ups up to 20 when compared to the CPU sequential algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Rao K, Yip P (1990) Discrete cosine transform: algorithms, advantages, applications. Academic Press, Boston

    MATH  Google Scholar 

  2. ISO (2000) ISO/IEC 15444-1. JPEG2000 image coding system

  3. Said A, Pearlman A (1996) A new, fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits, Systems and Video Technology 6(3):243–250

    Article  Google Scholar 

  4. Mallat SG (1989) A theory for multi-resolution signal decomposition: The wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693

    Article  MATH  Google Scholar 

  5. Sweldens W (1996) The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl Comput Harmon Anal 3(2):186–200

    Article  MathSciNet  MATH  Google Scholar 

  6. Sweldens W (1998) The lifting scheme: a construction of second generation wavelets. SIAM J Math Anal 29(2):511–546

    Article  MathSciNet  MATH  Google Scholar 

  7. Chrysafis C, Ortega A (2000) Line-based, reduced memory, wavelet image compression. IEEE Trans Image Process 9(3):378–389

    Article  MathSciNet  MATH  Google Scholar 

  8. Bao Y, Jay Kuo CC (2001) Design of wavelet-based image codec in memory-constrained environment. IEEE Trans Circuits Syst Video Technol 11(5):642–650

    Article  Google Scholar 

  9. Hsia C-H, Guo J-M, Chiang J-S, Lin C-H (2009) A novel fast algorithm based on SMDWT for visual processing applications. In: IEEE international symposium on circuits and systems, ISCAS 2009, pp 762–765

    Chapter  Google Scholar 

  10. Wippig D, Klauer B (2011) GPU-based translation-invariant 2d discrete wavelet transform for image processing. Int J Comput 5(2):226–234

    Google Scholar 

  11. Rost RJ (2006) OpenGL© shading language, 2nd edn. Addison-Wesley, Reading

    Google Scholar 

  12. Daubechies I, Sweldens W (1998) Factoring wavelet transforms into lifting steps. J Fourier Anal Appl 4(3):247–269

    Article  MathSciNet  MATH  Google Scholar 

  13. Shapiro JM (1993) Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans Signal Process 41(12):3445–3462

    Article  MATH  Google Scholar 

  14. OpenMP Architecture Review Board (2002) OpenMP C and C++ application program interface, version 2.0

  15. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6:40–53

    Article  Google Scholar 

  16. Corporation NVIDIA (2010) NVIDIA CUDA C programming guide, version 3.2

Download references

Acknowledgements

This research was partially supported by the Spanish Ministry of Science and Innovation under grant numbers TIN2011-27543-C03-03 and TIN2011-26254.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Galiano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galiano, V., López, O., Malumbres, M.P. et al. Parallel strategies for 2D Discrete Wavelet Transform in shared memory systems and GPUs. J Supercomput 64, 4–16 (2013). https://doi.org/10.1007/s11227-012-0750-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0750-5

Keywords

Navigation