Abstract
In this work we discuss the performance problems of nested OpenMP programs concerning thread and data locality particularly on cc-NUMA architectures. We provide a user friendly solution and demonstrate its benefits by comparing the performance of some kernel benchmarks and some real-world applications with and without applying our affinity optimizations.
This research is partially supported by the German Federal Ministry of Education and Research (BMBF) under the contract 03SF0326A “MeProRisk: Novel methods for exploration, development, and exploitation of geothermal reservoirs - a toolbox for prognosis and risk assessment.”
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ayguad, E., Martorell, X., Labarta, J., Gonzlez, M., Navarro, N.: Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study. In: Proc. of the 1999 International Conference on Parallel Processing, Ajzu, pp. 172–180 (1999)
Bull, J.M.: Measuring Synchronisation and Scheduling Overheads in OpenMP. In: Proceedings of First European Workshop on OpenMP, pp. 99–105 (1999)
Clauser, C. (ed.): Shemat and Processing Shemat - Numerical simulation of reactive flow in hot aquifers. Springer, Berlin (2002)
Hörschler, I., Meinke, M., Schröder, W.: Numerical simulation of the flow field in a model of the nasal cavity. Computers & Fluids 32(1), 39–45 (2003)
Johnson, S., Leggett, P., Ierotheou, C., Spiegel, A., an Mey, D., Hörschler, I.: Nested Parallelization of the Flow Solver TFS using the ParaWise Parallelization Environment. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 217–229. Springer, Heidelberg (2008)
Huang, L., Chapman, B., Liao, C.: An Implementation and Evaluation of Thread Subteam for OpenMP Extensions. In: Workshop on Programming Models for Ubiquitous Parallelism (PMUP 06), Seattle (2006)
McCalpin, J.D.: Memory Bandwidth and Machine Balance in Current High Performance Computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995, pp. 19–25 (1995)
Mohr, B., Malony, A.D., Shende, S., Wolf, F.: Design and Prototype of a Performance Tool Interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)
Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and Thread Affinity in OpenMP Programs. In: MAW ’08: Proceedings of the 2008 workshop on memory access on future processors, pp. 377–384. ACM, New York (2008)
Thibault, S., Broquedis, F., Goglin, B., Namyst, R., Wacrenier, P.-A.: An efficient openmp runtime system for hierarchical architectures. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 161–172. Springer, Heidelberg (2008)
Zhang, G.: Extending the OpenMP Standard for Thread Mapping and Grouping. In: Mueller, M.S., Chapman, B.M., de Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 435–446. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schmidl, D., Terboven, C., an Mey, D., Bücker, M. (2010). Binding Nested OpenMP Programs on Hierarchical Memory Architectures. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds) Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More. IWOMP 2010. Lecture Notes in Computer Science, vol 6132. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13217-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-13217-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13216-2
Online ISBN: 978-3-642-13217-9
eBook Packages: Computer ScienceComputer Science (R0)