Reactive Task Migration for Hybrid MPI+OpenMP Applications

Klinkenberg, Jannis; Samfass, Philipp; Bader, Michael; Terboven, Christian; Müller, Matthias S.

doi:10.1007/978-3-030-43222-5_6

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12044))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

691 Accesses
5 Citations

Abstract

Many applications in high performance computing are designed based on underlying performance and execution models. While these models could successfully be employed in the past for balancing load within and between compute nodes, modern software and hardware increasingly make performance predictability difficult if not impossible. Consequently, balancing computational load becomes much more difficult. Aiming to tackle these challenges in search for a general solution, we present a novel library for fine-granular task-based reactive load balancing in distributed memory based on MPI and OpenMP. With our approach, individual migratable tasks can be executed on any MPI rank. The actual executing rank is determined at run time based on online performance data. We evaluate our approach under an enforced power cap and under enforced clock frequency changes for a synthetic benchmark and show its robustness for work-induced imbalances for a realistic application. Our experiments demonstrate speedups of up to \(1.31\text {X}\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Migration decision are made on a each rank separately based on per rank load information that has been exchanged before. Consequently, this step does not require any additional two-sided or collective communication.
2.
Although we planned to conduct the tests on our new Intel Xeon Skylake processors, this partition was still in the process of getting into production at the time of creating the paper.

References

Acun, B., Miller, P., Kale, L.V.: Variation among processors under Turbo Boost in HPC systems. In: Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, pp. 6:1–6:12. ACM, New York (2016). https://doi.org/10.1145/2925426.2926289
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 1995, pp. 207–216. ACM, New York (1995). https://doi.org/10.1145/209936.209958
Charles, J., Jassi, P., Ananth, N.S., Sadat, A., Fedorova, A.: Evaluation of the Intel® Core™ i7 Turbo Boost feature. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC 2009, pp. 188–197. IEEE Computer Society, Washington, DC (2009). https://doi.org/10.1109/IISWC.2009.5306782
Denis, A., Jaeger, J., Taboada, H.: Progress thread placement for overlapping MPI non-blocking collectives using simultaneous multi-threading. In: Mencagli, G., et al. (eds.) Euro-Par 2018. LNCS, vol. 11339, pp. 123–133. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10549-5_10
Chapter Google Scholar
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC 2009, pp. 1–11, November 2009. https://doi.org/10.1145/1654059.1654113
Hoefler, T., Lumsdaine, A.: Message progression in parallel computing - to thread or not to thread? In: Proceedings - IEEE International Conference on Cluster Computing, ICCC. Proceeding, pp. 213–222, September 2008. https://doi.org/10.1109/CLUSTR.2008.4663774
Inadomi, Y., et al.: Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 78:1–78:12. ACM, New York (2015). https://doi.org/10.1145/2807591.2807638
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. SIGPLAN Not. 28(10), 91–108 (1993). https://doi.org/10.1145/167962.165874
Article Google Scholar
Meister, O., Rahnema, K., Bader, M.: Parallel memory-efficient adaptive mesh refinement on structured triangular meshes with billions of grid cells. ACM Trans. Math. Softw. 43(3), 1–27 (2016). https://doi.org/10.1145/2947668
Article MathSciNet MATH Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 5.0, November 2018. http://www.openmp.org/
Pinar, A., Aykanat, C.: Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distri. Comput. 64(8), 974–996 (2004). https://doi.org/10.1016/j.jpdc.2004.05.003
Article MATH Google Scholar
Reinders, J.: Intel Threading Building Blocks, 1st edn. O’Reilly & Associates Inc., Sebastopol (2007)
Google Scholar
Samfass, P., Klinkenberg, J., Bader, M.: Hybrid MPI+OpenMP reactive work stealing in distributed memory in the PDE framework sam(oa\()^2\). In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), CLUSTER 2018, pp. 337–347. IEEE, September 2018. https://doi.org/10.1109/CLUSTER.2018.00051
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI 2010, The First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego CA (2010)
Google Scholar
Zanotti, O., Fambri, F., Dumbser, M., Hidalgo, A.: Space–time adaptive ADER discontinuous Galerkin finite element schemes with a posteriori sub-cell finite volume limiting. Comput. Fluids 118, 204–224 (2015). https://doi.org/10.1016/j.compfluid.2015.06.020, http://www.sciencedirect.com/science/article/pii/S0045793015002030

Download references

Acknowledgements

Some of the experiments were performed with computing resources granted by JARA-HPC from RWTH Aachen University under projects jara0001 and nova0027. Parts of this work were funded by the German Federal Ministry of Education and Research (BMBF) under grant numbers 01IH16004B and 01IH16004C (Project Chameleon).

Author information

Authors and Affiliations

Chair for High Performance Computing, IT Center, RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg, Christian Terboven & Matthias S. Müller
Department of Informatics, Technical University of Munich, Garching, Germany
Philipp Samfass & Michael Bader

Authors

Jannis Klinkenberg
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Samfass
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bader
View author publications
You can also search for this author in PubMed Google Scholar
Christian Terboven
View author publications
You can also search for this author in PubMed Google Scholar
Matthias S. Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jannis Klinkenberg .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klinkenberg, J., Samfass, P., Bader, M., Terboven, C., Müller, M.S. (2020). Reactive Task Migration for Hybrid MPI+OpenMP Applications. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12044. Springer, Cham. https://doi.org/10.1007/978-3-030-43222-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-43222-5_6
Published: 19 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43221-8
Online ISBN: 978-3-030-43222-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics