Skip to main content

Binding Nested OpenMP Programs on Hierarchical Memory Architectures

  • Conference paper
Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More (IWOMP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6132))

Included in the following conference series:

Abstract

In this work we discuss the performance problems of nested OpenMP programs concerning thread and data locality particularly on cc-NUMA architectures. We provide a user friendly solution and demonstrate its benefits by comparing the performance of some kernel benchmarks and some real-world applications with and without applying our affinity optimizations.

This research is partially supported by the German Federal Ministry of Education and Research (BMBF) under the contract 03SF0326A “MeProRisk: Novel methods for exploration, development, and exploitation of geothermal reservoirs - a toolbox for prognosis and risk assessment.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ayguad, E., Martorell, X., Labarta, J., Gonzlez, M., Navarro, N.: Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. In: Proc. of the 1999 International Conference on Parallel Processing, Ajzu, pp. 172–180 (1999)

    Google Scholar 

  2. Bull, J.M.: Measuring Synchronisation and Scheduling Overheads in OpenMP. In: Proceedings of First European Workshop on OpenMP, pp. 99–105 (1999)

    Google Scholar 

  3. Clauser, C. (ed.): Shemat and Processing Shemat - Numerical simulation of reactive flow in hot aquifers. Springer, Berlin (2002)

    Google Scholar 

  4. Hörschler, I., Meinke, M., Schröder, W.: Numerical simulation of the flow field in a model of the nasal cavity. Computers & Fluids 32(1), 39–45 (2003)

    Article  MATH  Google Scholar 

  5. Johnson, S., Leggett, P., Ierotheou, C., Spiegel, A., an Mey, D., Hörschler, I.: Nested Parallelization of the Flow Solver TFS using the ParaWise Parallelization Environment. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 217–229. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Huang, L., Chapman, B., Liao, C.: An Implementation and Evaluation of Thread Subteam for OpenMP Extensions. In: Workshop on Programming Models for Ubiquitous Parallelism (PMUP 06), Seattle (2006)

    Google Scholar 

  7. McCalpin, J.D.: Memory Bandwidth and Machine Balance in Current High Performance Computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995, pp. 19–25 (1995)

    Google Scholar 

  8. Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and Prototype of a Performance Tool Interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)

    Article  MATH  Google Scholar 

  9. Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and Thread Affinity in OpenMP Programs. In: MAW ’08: Proceedings of the 2008 workshop on memory access on future processors, pp. 377–384. ACM, New York (2008)

    Chapter  Google Scholar 

  10. Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An efficient openmp runtime system for hierarchical architectures. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 161–172. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Zhang, G.: Extending the OpenMP Standard for Thread Mapping and Grouping. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 435–446. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schmidl, D., Terboven, C., an Mey, D., Bücker, M. (2010). Binding Nested OpenMP Programs on Hierarchical Memory Architectures. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds) Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More. IWOMP 2010. Lecture Notes in Computer Science, vol 6132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13217-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13217-9_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13216-2

  • Online ISBN: 978-3-642-13217-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics