Abstract
The three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking, compression of volumetric medical data, multispectral image coding, 3D model coding and video coding. In this work, we present several strategies to speed up the 3D-DWT computation through multicore processing. An in depth analysis of the available compiler optimizations is also presented. Depending on both the multicore platform and the GOP size, the developed parallel algorithm obtains efficiencies above 95 % using up to four cores (or processes), and above 83 % using up to 12 cores. Furthermore, the extra memory requirements is under 0.12 % for low resolution video frames, and under 0.017 % for high resolution video frames. In this work, we also present a CUDA-based algorithm to compute the 3D-DWT using the shared memory for the extra memory demands, obtaining speed-ups up to 12.68 on the many-core GTX280 platform. In areas such as video processing or ultra high definition image processing, the memory requirements can significantly degrade the developed algorithms, however, our algorithm increases the memory requirements in a negligible percentage, being able to perform a nearly in-place computation of the 3D-DWT whereas in other state-of-the-art 3D-DWT algorithms it is quite common to use a different memory space to store the computed wavelet coefficients doubling in this manner the memory requirements.

















Similar content being viewed by others
References
Campisi P, Neri A (2005) Video watermarking in the 3D-DWT domain using perceptual masking. In: IEEE international conference on image processing, September 2005, pp 997–1000
Schelkens P, Munteanu A, Barbariend J, Galca M, Giro-Nieto X, Cornelis J (2003) Wavelet coding of volumetric medical datasets. IEEE Trans Med Imaging 22(3):441–458
Dragotti PL, Poggi G (2000) Compression of multispectral images by three-dimensional SPITH algorithm. IEEE Trans Geosci Remote Sens 38(1):416–428
Aviles M, Moran F, Garcia N (2005) Progressive lower trees of wavelet coefficients: efficient spatial and SNR scalable coding of 3D models. Lect Notes Comput Sci 3767:61–72
Podilchuk CI, Jayant NS, Farvardin N (1995) Three dimensional subband coding of video. IEEE Trans Image Process 4(2):125–135
Taubman D, Zakhor A (1994) Multirate 3-D subband coding of video. IEEE Trans Image Process 3(5):572–588
Shapiro JM (1993) Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans Signal Process 41(12):1–2
Said A, Pearlman A (1996) A new, fast and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans Circuits Syst Video Technol 6(3):243–250
Oliver J, Malumbres MP (2006) Low-complexity multiresolution image compression using wavelet lower trees. IEEE Trans Circuits Syst Video Technol 16(11):1437–1444
Chen Y, Pearlman WA (1996) Three-dimensional subband coding of video using the zero-tree method. In: Visual communications and image processing. Proc SPIE, vol 2727, pp 1302–1309
Luo J, Wang X, Chen CW, Parker KJ (1996) Volumetric medical image compression with three-dimensional wavelet transform and octave zerotree coding. In: Visual communications and image processing. Proc SPIE, vol 2727, pp 579–590
Kim BJ, Xiong Z, Pearlman WA (2000) Low bit-rate scalable video coding with 3D set partitioning in hierarchical trees (3D SPIHT). IEEE Trans Circuits Syst Video Technol 10:1374–1387
Lopez O, Martinez-Rach M, Piñol P, Malumbres MP, Oliver J (2010) Low bit-rate video coding with 3D lower trees (3D-LTW). Lect Notes Comput Sci 6077:256–263
Wong T-T, Leung C-S, Heng P-A, Wang J (2007) Discrete wavelet transform on consumer-level graphics hardware. IEEE Trans Multimed 9(3):668–673
Tenllado C, Setoain J, Prieto M, Pinuel L, Tirado F (2008) Parallel implementation of the 2D discrete wavelet transform on graphics processing units: filter bank versus lifting. IEEE Trans Parallel Distrib Syst 19(3):299–310
Franco J, Bernabé G, Fernández J, Acacio ME, Ujaldón M (2010) The GPU on the 2D wavelet transform. survey and contributions. In: Proceedings of para 2010: state of the art in scientific and parallel computing
Galiano V, López O, Malumbres MP, Migallón H (2011) Improving the discrete wavelet transform computation from multicore to gpu-based algorithms. In: Proceedings of international conference on computational and mathematical methods in science and engineering
Franco J, Bernabé G, Fernández J, Ujaldón M (2010) Parallel 3D fast wavelet transform on manycore gpus and multicore cpus. Proc Comput Sci 1(1):1101–1110
Mallat SG (1989) A theory for multi-resolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
OpenMP application program interface, version 3.1. OpenMP Architecture Review Board (2011). http://www.openmp.org
ICC, intel software network. http://software.intel.com/en-us/intel-compilers/, 2009–2011
GCC, the GNU compiler collection. Free Software Foundation, Inc 2009–2012 http://gcc.gnu.org
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with cuda. In: Queue, vol 6, pp 40–53
NVIDIA Corporation. Nvidia CUDA C programming guide. version 3.2
Acknowledgements
This research was partially supported by the Spanish Ministry of Education and Science under grant DPI2007-66796-C03-03 and the Spanish Ministry of Science and Innovation under grant number TIN2008-06570-C04-04.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Galiano, V., López-Granado, O., Malumbres, M.P. et al. Fast 3D wavelet transform on multicore and many-core computing platforms. J Supercomput 65, 848–865 (2013). https://doi.org/10.1007/s11227-013-0868-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0868-0