Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures

Clet-Ortega, Jérôme; Carribault, Patrick; Pérache, Marc

doi:10.1007/978-3-319-09873-9_50

Jérôme Clet-Ortega¹⁶,
Patrick Carribault¹⁶ &
Marc Pérache¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2781 Accesses
7 Citations

Abstract

Current generation of high performance computing platforms tends to hold a large number of cores. Therefore applications have to expose a fine-grain parallelism to be more efficient. Since version 3.0, the OpenMP standard proposes a way to express such parallelism through tasks. Because the task scheduling strategy is implementation defined, each runtime can have a different behavior and efficiency. Notwithstanding, the hierarchical characteristic of current parallel computing systems is rarely considered. This might come down to a loss of performance on large multicore NUMA systems. This paper studies multiple task scheduling algorithms with a configurable scheduler. It relies on a topology-aware tree-based representation of the computing platform to orchestrate the execution and the load-balacing of OpenMP tasks. High-end users can select the task-list granularity according to the tree structure and choose the most convenient work-stealing strategy. One of these strategies takes into account data locality with the help of the hierarchical view. It performs well with unbalanced codes, from BOTS benchmarks, in comparison to Intel and GNU OpenMP runtimes on 16-core and 128-core systems.

Download to read the full chapter text

Chapter PDF

Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime

OpenMP Extension for Explicit Task Allocation on NUMA Architecture

NUMA-Aware Task Performance Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ma, T., Bosilca, G., Bouteiller, A., Goglin, B., Squyres, J.M., Dongarra, J.J.: Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs. In: International Conference on Parallel Processing (ICPP), pp. 532–541 (2011)
Google Scholar
Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R.: ForestGOMP: An efficient OpenMP environment for NUMA architectures. International Journal on Parallel Programming, 418–439 (2010)
Google Scholar
Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., Chapman, B.: High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Computing, 562–575 (2011)
Google Scholar
The OpenMP API specification for parallel programming, http://www.openmp.org
An OpenMP implementation for GCC, http://gcc.gnu.org/projects/gomp
Intel Xeon Phi Coprocessor - The Architecture. http://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-codename-knights-corner
Intel OpenMP Runtime Library, https://www.openmprtl.org
Terboven, C., Schmidl, D., Cramer, T., an Mey, D.: Assessing OpenMP Tasking Implementations on NUMA Architectures. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 182–195. Springer, Heidelberg (2012)
Chapter Google Scholar
Pérache, M., Jourdren, H., Namyst, R.: MPC: A unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 78–88. Springer, Heidelberg (2008)
Chapter Google Scholar
Addison, C., LaGrone, J., Huang, L., Chapman, B.: OpenMP 3.0 tasking implementation in OpenUH. Open64 Workshop at CGO (2009)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. Journal of Parallel and Distributed Computing, 207–216 (1995)
Google Scholar
Liao, C., Quinlan, D.J., Panas, T., de Supinski, B.R.: A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 15–28. Springer, Heidelberg (2010)
Chapter Google Scholar
Olivier, S., Porterfield, A., Wheeler, K.B., Spiegel, M., Prins, J.F.: OpenMP task scheduling strategies for multicore NUMA systems. International Journal of High Performance Computing Applications, 110–124 (2012)
Google Scholar
Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: An API for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–8 (2008)
Google Scholar
Gautier, T., Ferreira Lima, J.V., Maillard, N., Raffin, B.: XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures. In: IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1299–1308 (2013)
Google Scholar
Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Chapter Google Scholar
Agathos, S.N., Kallimanis, N.D., Dimakopoulos, V.V.: Speeding up OpenMP tasking. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 650–661. Springer, Heidelberg (2012)
Chapter Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)
Chapter Google Scholar
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: Hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications. In: The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, PDP, pp. 180–186 (2010)
Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: Proceedings of the 2009 International Conference on Parallel Processing, pp. 124–131 (2009)
Google Scholar
Mahéo, A., Koliaï, S., Carribault, P., Pérache, M., Jalby, W.: Adaptive OpenMP for Large NUMA Nodes. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 254–257. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

CEA, DAM, DIF, F-91297, Arpajon, France
Jérôme Clet-Ortega, Patrick Carribault & Marc Pérache

Authors

Jérôme Clet-Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pérache
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Clet-Ortega, J., Carribault, P., Pérache, M. (2014). Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_50

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime

OpenMP Extension for Explicit Task Allocation on NUMA Architecture

NUMA-Aware Task Performance Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures

Abstract

Chapter PDF

Similar content being viewed by others

Assessing Task-to-Data Affinity in the LLVM OpenMP Runtime

OpenMP Extension for Explicit Task Allocation on NUMA Architecture

NUMA-Aware Task Performance Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation