Enhancing Load-Balancing of MPI Applications with Workshare

Dionisi, Thomas; Bouhrour, Stephane; Jaeger, Julien; Carribault, Patrick; Pérache, Marc

doi:10.1007/978-3-030-85665-6_29

Thomas Dionisi¹¹,
Stephane Bouhrour¹¹,
Julien Jaeger^11,12,13,
Patrick Carribault^12,13 &
…
Marc Pérache^12,13

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

European Conference on Parallel Processing

1779 Accesses
2 Citations

Abstract

Some high-performance parallel applications (e.g., simulation codes) are, by nature, prone to computational imbalance. With various elements, such as particles or multiple materials, evolving in a fixed space (with different boundary conditions), an MPI process can easily end up with more operations to perform than its neighbors. This computational imbalance causes performance loss. Load-balancing methods are used to limit such negative impacts. However, most load-balancing schemes rely on shared-memory models, and those handling MPI load-balancing use too much heavy machinery for efficient intra-node load-balancing. In this paper, we present the MPI Workshare concept. With MPI Workshare, we propose a programming interface based on directives, and the associated implementation, to leverage light MPI intra-node load-balancing. In this work, we focus on loop worksharing. The similarity of our directives with OpenMP ones makes our interface easy to understand and to use. We provide an implementation of both the runtime and compiler directive support. Experimental results on well-known mini-applications (MiniFE, LULESH) show that MPI Workshare succeeds in maintaining the same level of performance as well-balanced workloads even with high imbalance parameter values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Besnard, J.B., et al.: Mixing ranks, tasks, progress and nonblocking collectives. In: Proceedings of the 26th European MPI Users’ Group Meeting. EuroMPI 2019, Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3343211.3343221
Ciorba, F.M., Iwainsky, C., Buder, P.: OpenMP loop scheduling revisited: making a case for more schedules. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 21–36. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_2
Chapter Google Scholar
Clet-Ortega, J., Carribault, P., Pérache, M.: Evaluation of OpenMP task scheduling algorithms for large NUMA architectures. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 596–607. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_50
Chapter Google Scholar
Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11 (November 2009). https://doi.org/10.1145/1654059.1654113
Durand, M., Broquedis, F., Gautier, T., Raffin, B.: An efficient OpenMP loop scheduler for irregular applications on large-scale NUMA machines. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 141–155. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_11
Chapter Google Scholar
Kumar, V., Murthy, K., Sarkar, V., Zheng, Y.: Optimized distributed work-stealing. In: 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3), pp. 74–77 (November 2016). https://doi.org/10.1109/IA3.2016.019
Kumar, V., Zheng, Y., Cavé, V., Budimlić, Z., Sarkar, V.: Habaneroupc++: A compiler-free pgas library. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 5:1–5:10. ACM, New York (2014). https://doi.org/10.1145/2676870.2676879
Maroas, M., Teruel, X., Bull, J.M., Ayguad, E., Beltran, V.: Evaluating worksharing tasks on distributed environments. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 69–80 (2020). https://doi.org/10.1109/CLUSTER49012.2020.00017
Martín, G., Marinescu, M.-C., Singh, D.E., Carretero, J.: FLEX-MPI: an MPI extension for supporting dynamic load balancing on heterogeneous non-dedicated systems. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 138–149. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40047-6_16
Chapter Google Scholar
Min, S.J., Iancu, C., Yelick, K.: Hierarchical work stealing on manycore clusters. In: 5th Conference on Partitioned Global Address Space Programming Models, p. 35 (2011)
Google Scholar
Muddukrishna, A., Jonsson, P.A., Vlassov, V., Brorsson, M.: Locality-aware task scheduling and data distribution on NUMA systems. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 156–170. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_12
Chapter Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Spiegel, M., Prins, J.F.: Openmp task scheduling strategies for multicore numa systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012). https://doi.org/10.1177/1094342011434065
Article Google Scholar
Ouyang, K., Si, M., Hori, A., Chen, Z., Balaji, P.: Cab-mpi: Exploring interprocess work-stealing towards balanced mpi communication. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, IEEE Press (2020)
Google Scholar
Pérache, M., Carribault, P., Jourdren, H.: MPC-MPI: an MPI implementation reducing the overall memory consumption. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 94–103. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03770-2_16
Chapter Google Scholar
Pezzi, G.P., Cera, M.C., Mathias, E., Maillard, N., Navaux, P.O.A.: On-line scheduling of mpi-2 programs with hierarchical work stealing. In: 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), pp. 247–254 (October 2007). https://doi.org/10.1109/SBAC-PAD.2007.36
Ravichandran, K., Lee, S., Pande, S.: Work stealing for Multi-core HPC clusters. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 205–217. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23400-2_20
Chapter Google Scholar
Sala, K., et al.: Improving the interoperability between mpi and task-based programming models. In: Proceedings of the 25th European MPI Users’ Group Meeting, EuroMPI 2018, Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3236367.3236382
Subramaniam, S., Eager, D.L.: Affinity scheduling of unbalanced workloads. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, pp. 214–226. IEEE Computer Society Press, Los Alamitos, CA, USA (1994). http://dl.acm.org/citation.cfm?id=602770.602810
Thoman, P., Jordan, H., Pellegrini, S., Fahringer, T.: Automatic OpenMP loop scheduling: a combined compiler and runtime approach. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 88–101. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_7
Chapter Google Scholar
Virouleau, P., Broquedis, F., Gautier, T., Rastello, F.: Using data dependencies to improve task-based scheduling strategies on NUMA architectures. In: Dutot, P.-F., Trystram, D. (eds.) Euro-Par 2016. LNCS, vol. 9833, pp. 531–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43659-3_39
Chapter Google Scholar

Download references

Acknowledgments

This work was performed under the Exascale Computing Research collaboration, with the support of CEA and UVSQ.

Author information

Authors and Affiliations

Exascale Computing Research Laboratory, 2 Rue de la Piquetterie, 91680, Bruyères-le-châtel, France
Thomas Dionisi, Stephane Bouhrour & Julien Jaeger
CEA, DAM, DIF, 91297, Arpajon, France
Julien Jaeger, Patrick Carribault & Marc Pérache
Université Paris-Saclay, CEA, Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, 91680, Bruyères-le-châtel, France
Julien Jaeger, Patrick Carribault & Marc Pérache

Authors

Thomas Dionisi
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Bouhrour
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jaeger
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pérache
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Jaeger .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Leonel Sousa
Universidade de Lisboa, Lisbon, Portugal
Nuno Roma
Universidade de Lisboa, Lisbon, Portugal
Pedro Tomás

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dionisi, T., Bouhrour, S., Jaeger, J., Carribault, P., Pérache, M. (2021). Enhancing Load-Balancing of MPI Applications with Workshare. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-85665-6_29
Published: 25 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics