Skip to main content

Fast Wavelet Transform Utilizing a Multicore-Aware Framework

  • Conference paper
Applied Parallel and Scientific Computing (PARA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7134))

Included in the following conference series:

Abstract

The move to multicore processors creates new demands on software development in order to profit from the improved capabilities. Most important, algorithm and code must be parallelized wherever possible, but also the growing memory wall must be considered. Additionally, high computational performance can only be reached if architecture-specific features are made use of. To address this complexity, we developed a C++ framework that simplifies the development of performance-optimized, parallel, memory-efficient, stencil-based codes on standard multicore processors and the heterogeneous Cell processor developed jointly by Sony, Toshiba, and IBM. We illustrate the implementation and optimization of the Fast Wavelet Transform and its inverse for Haar wavelets within our hybrid framework, using OpenMP, and using the Open Compute Language, and analyze performance results for different platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abschlussbericht des Projekts Ru 422/7-5 (DiME-2). Lehrstuhl für Informatik 10 (Systemsimulation), Friedrich-Alexander-Universität Erlangen-Nürnberg (2008)

    Google Scholar 

  2. Christen, M., Schenk, O., Neufeld, E., Messmer, P., Burkhart, H.: Parallel data-locality aware stencil computations on modern micro-architectures. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–10. IEEE Computer Society (2009)

    Google Scholar 

  3. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1), 129–159 (2009)

    Article  MATH  Google Scholar 

  4. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12 (2009)

    Google Scholar 

  5. Franco, J., Bernabé, G., Fernández, J., Acacio, M.: A Parallel Implementation of the 2D Wavelet Transform Using CUDA. In: Parallel, Distributed and Network-Based Processing, pp. 111–118 (2009)

    Google Scholar 

  6. Franco, J., Bernabé, G., Fernández, J., Ujaldón, M.: Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. Procedia Computer Science 1(1), 1095–1104 (2010)

    Article  Google Scholar 

  7. Garcia, A., Shen, H.: GPU-based 3D wavelet reconstruction with tileboarding. The Visual Computer 21(8), 755–763 (2005)

    Article  Google Scholar 

  8. Haar, A.: Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen 69, 331–371 (1910)

    Article  MathSciNet  MATH  Google Scholar 

  9. International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation: Cell Broadband Engine Architecture 1.02 (2007)

    Google Scholar 

  10. Kowarschik, M.: Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on Hierarchical Memory Architectures (2004)

    Google Scholar 

  11. McKinley, K.S., Carr, S., Tseng, C.W.: Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18(4), 424–453 (1996)

    Article  Google Scholar 

  12. Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: SC 2009: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. pp. 1–12. ACM, New York (2009)

    Google Scholar 

  13. Ohshima, S., Hirasawa, S., Honda, H.: OMPCUDA: OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler. In: Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, pp. 161–173 (2010)

    Google Scholar 

  14. Stürmer, M., Rüde, U.: A framework that supports in writing performance-optimized stencil-based codes. Tech. Rep. 10-5, Lehrstuhl für Informatik 10 (Systemsimulation), Friedrich-Alexander-Universität Erlangen-Nürnberg (2010)

    Google Scholar 

  15. Tenllado, C., Setoain, J., Prieto, M., et al.: Parallel implementation of the 2d discrete wavelet transform on graphics processing units: Filter bank versus lifting. IEEE Transactions on Parallel and Distributed Systems 19(3), 299–310 (2008)

    Article  Google Scholar 

  16. Weiß, C.: Data Locality Optimizations for Multigrid Methods on Structured Grids. Ph.D. thesis, Lehrstuhlr für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany (2001)

    Google Scholar 

  17. Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference, vol. 01, pp. 579–586. IEEE Computer Society (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stürmer, M., Köstler, H., Rüde, U. (2012). Fast Wavelet Transform Utilizing a Multicore-Aware Framework. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28145-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28144-0

  • Online ISBN: 978-3-642-28145-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics