Abstract
Maximizing the scope of a parallel region, which avoids the costs of barriers and of launching additional parallel regions, is among the first recommendations in any optimization guide for OpenMP. While clearly beneficial and easily accomplished for code where regions are visibly contiguous, regions often become contiguous only after compiler optimization or resolution of abstraction layers. This paper explores changes to the OpenMP specification that would allow implementations to merge adjacent parallel regions automatically, including the removal of issues that make the transformation non-conforming and the addition of hints that facilitate the optimization. Beyond simple merging, we explore hints to fuse workshared loops that occur in syntactically distinct parallel regions or to apply nowait to such loops. Our evaluation shows these changes can provide an overall speedup of 2–8\(\times \) for a microbenchmark, or 6 % for a representative physics application.
This material is based upon work supported by the U.S. Department of Energy (LLNL-CONF-670944).
The rights of this work are transferred to the extent transferable according to title 17 §105 U.S.C.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bronevetsky, G., Gyllenhaal, J., de Supinski, B.R.: CLOMP: accurately characterizing OpenMP application overheads. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 13–25. Springer, Heidelberg (2008)
Bull, J.M.: Measuring synchronisation and scheduling overheads in OpenMP. In: Proceedings of First European Workshop on OpenMP, vol. 8, p. 49 (1999)
Cray. craycc manual page. http://docs.cray.com/cgi-bin/craydoc.cgi?mode=View;id=sw_releases-j4spa4zu-1396361754;idx=man_search;this_sort=title;q=;type=man;title=Cray%20Compiling%20Environment%20%28CCE%29%208.3%3a%20C/C%2b%2b/Fortran%20Compiler%20Man%20Pages
Eichenberger, A.E., O’Brien, K.: Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q. IBM J. Res. Dev. 57(1/2), 8:1–8:8 (2013)
Hornung, R., Keasler, J.: The RAJA portability layer: overview and status. Technical report, Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2014)
Karlin, I., Bhatele, A., Keasler, J., Chamberlain, B.L., Cohen, J., Devito, Z., Haque, R., Laney, D., Luke, E., Wang, F., Richards, D., Schulz, M., Still, C.H.: Exploring traditional and emerging parallel programming models using a proxy application. In: International Parallel and Distributed Processing Symposium, pp. 919–932 (2013)
Müller, M.: Some simple OpenMP optimization techniques. In: Eigenmann, R., Voss, M.J. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 31–39. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland (outside the US)
About this paper
Cite this paper
Scogland, T.R.W., Gyllenhaal, J., Keasler, J., Hornung, R., de Supinski, B.R. (2015). Enabling Region Merging Optimizations in OpenMP. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-24595-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)