Skip to main content

An Approach for Semiautomatic Locality Optimizations Using OpenMP

  • Conference paper
Applied Parallel and Scientific Computing (PARA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7134))

Included in the following conference series:

  • 1752 Accesses

Abstract

The processing power of multicore CPUs increases at a high rate, whereas memory bandwidth is falling behind. Almost all modern processors use multiple cache levels to overcome the penalty of slow main memory; however cache efficiency is directly bound to data locality. This paper studies a possible way to incorporate data locality exposure into the syntax of the parallel programming system OpenMP. We study data locality optimizations on two applications: matrix multiplication and Gauß-Seidel stencil. We show that only small changes to OpenMP are required to expose data locality so a compiler can transform the code. Our notion of tiled loops allows developers to easily describe data locality even at scenarios with non-trivial data dependencies. Furthermore, we describe two optimization techniques. One explicitly uses a form of local memory to prevent conflict cache misses, whereas the second one modifies the wavefront parallel programming pattern with dynamically sized blocks to increase the number of parallel tasks. As an additional contribution we explore the benefit of using multiple levels of tiling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmed, N., Mateev, N., Pingali, K.: Tiling imperfectly-nested loop nests. In: Supercomputing 2000: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (CDROM), page 31. IEEE Computer Society, Washington, DC (2000)

    Google Scholar 

  2. Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high-performance computing. ACM Comput. Surv. 26(4), 345–420 (1994)

    Article  Google Scholar 

  3. Bader, M., Franz, R., Günther, S., Heinecke, A.: Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 628–638. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Culler, D., Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach, 1st edn. The Morgan Kaufmann Series in Computer Architecture and Design. Morgan Kaufmann (1998)

    Google Scholar 

  5. Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1), 129–159 (2009)

    Article  MATH  Google Scholar 

  6. Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user-defined reductions. J. Supercomput. 23(1), 23–37 (2002)

    Article  MATH  Google Scholar 

  7. Gan, G., Wang, X., Manzano, J., Gao, G.R.: Tile Reduction: The First Step Towards Tile Aware Parallelization in OpenMP. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 140–153. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (December 1995)

    Google Scholar 

  9. Pfister, G.F.: In search of clusters, 2nd edn. Prentice-Hall, Inc., Upper Saddle River (1998)

    Google Scholar 

  10. Scholz, S.-B.: On defining application-specific high-level array operations by means of shape-invariant programming facilities. In: APL 1998: Proceedings of the APL 1998 Conference on Array Processing Language, pp. 32–38. ACM, New York (1998)

    Google Scholar 

  11. Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Breitbart, J. (2012). An Approach for Semiautomatic Locality Optimizations Using OpenMP. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28145-7_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28144-0

  • Online ISBN: 978-3-642-28145-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics