Abstract
Because of the computational power of today’s GPUs, they are starting to be harnessed more and more to help out CPUs on high-performance computing. In addition, an increasing number of today’s state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this work, we present a GPU implementation of an image processing application of growing popularity: The 2D fast wavelet transform (2D-FWT). Based on a pair of Quadrature Mirror Filters, a complete set of application-specific optimizations are developed from a CUDA perspective to achieve outstanding factor gains over a highly optimized version of 2D-FWT run in the CPU. An alternative approach based on the Lifting Scheme is also described in Franco et al. (Acceleration of the 2D wavelet transform for CUDA-enabled Devices, 2010). Then, we investigate hardware improvements like multicores on the CPU side, and exploit them at thread-level parallelism using the OpenMP API and pthreads . Overall, the GPU exhibits better scalability and parallel performance on large-scale images to become a solid alternative for computing the 2D-FWT versus those thread-level methods run on emerging multicore architectures.
Similar content being viewed by others
References
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. J. Comput. Graph. Forum 26, 21–51 (2007)
Mallat, S.: A theory for multiresolution signal descomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Bernabé, G., González, J., García, J.M., Duato, J.: A new lossy 3-D wavelet transform for high-quality compression of medical video. In: IEEE EMBS International Conference on Information Technology Applications in Biomedicine (2000)
Daubechies, I.: Ten lectures on wavelets. Soc. Ind. Appl. Math. (1992)
Tenllado, C., Setoain, J., Prieto, M., Nuel, L.P., Tirado, F.: Parallel implementation of the 2D discrete wavelet transform on graphics processing units: filter bank versus lifting. IEEE Trans. Parallel Distrib. Syst. 19(2), 299–310 (2008)
Meerwald, P., Norcen, R., Uhl, A.: Cache issues with JPEG2000 wavelet lifting. In: VCIP, vol. 4671, pp. 626–634 (2002)
Tao, J., Shahbahrami, A., Juurlink, B., Buchty, R., Karl, W., Vassiliadis, S.: Optimizing cache performance of the discrete wavelet transform using a visualization tool. In: 9th IEEE International Symposium on Multimedia, pp. 153–160 (2007)
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Improving the memory behavior of vertical filtering in the discrete wavelet transform. In: Conference on Computing Frontiers. ACM, pp. 253–260 (2006)
Kirk, D., Hwu, W.: Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, Menlo Park. ISBN: 978-0-12-381472-2 (2010)
Intel C++ Compiler Options (Document Number: 307776-002US) (2007)
GNU compiler collection GCC http://gcc.gnu.org (2010)
OpenMP The OpenMP API. http://www.openmp.org (2010)
Moreland, K., Angel, E.: The FFT on a GPU. In: SIGGRAPH Eurographics 6th Workshop on Computer Graphics Hardware, San Diego, (California, US), 26-27 July, pp. 112–119 (2003)
NVIDIA Corporation NVIDIA CUDA CUFFT Library Version 1.1 (2007)
Govindaraju, N., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J.: High performance discrete fourier transforms on graphics processors. In: Proceedings Supercomputing 2008, Austin, TX (USA) (2008)
Nukada, A., Yasuhiko, O., Endo, T., Matsuoka, S.: Bandwidth intensive 3d fft kernel for gpus using cuda. In: Proceedings Supercomputing 2008, Austin, TX (USA) (2008)
Wong, T.T., Leung, C.S., Heng, P.A., Wang, J.: Discrete wavelet transform on consumer-level graphics hardware. IEEE Trans. Multimedia 9(3), 668–673 (2007)
Franco, J., Bernabe, G., Fernandez, J., Acacio, M.E., Ujaldon, M.: Acceleration of the 2D wavelet transform for CUDA-enabled devices. In: 10th PARA’2010: State of the Art in Scientific and Parallel Computing. Minisymposium on GPU Computing. Reykjavik (Iceland), June (2010)
Franco, J., Bernabe, G., Fernandez, J., Ujaldon, M.: Parallel 3D wavelet transform on multicore CPUs and Manycore GPUs. In: 10th International Conference on Computational Science. 2nd Workshop on Emerging Parallel Architectures. Amsterdam (The Netherlands), May (2010)
Sumanaweera, T., Liu, D.: Medical image reconstruction with the FFT. In: Matt Pharr (ed.) GPU Gems 2, pp. 765–784. Addison-Wesley, Reading (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported by the Spanish MEC and EU FEDER funds under grants “Consolider Ingenio-2010 CSD2006-00046” and “TIN2006-15516-C04-03”.
Rights and permissions
About this article
Cite this article
Franco, J., Bernabé, G., Fernández, J. et al. The 2D wavelet transform on emerging architectures: GPUs and multicores. J Real-Time Image Proc 7, 145–152 (2012). https://doi.org/10.1007/s11554-011-0224-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-011-0224-7