Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct

Pereira, Romain; Martin, Maël; Roussel, Adrien; Carribault, Patrick; Gautier, Thierry

doi:10.1007/978-3-031-40744-4_5

Romain Pereira^12,14,
Maël Martin^12,13,
Adrien Roussel^12,13,
Patrick Carribault^12,13 &
…
Thierry Gautier¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14114))

Included in the following conference series:

International Workshop on OpenMP

445 Accesses

Abstract

Many-core and heterogeneous architectures now require programmers to compose multiple asynchronous programming model to fully exploit hardware capabilities. As a shared-memory parallel programming model, OpenMP has the responsibility of orchestrating the suspension and progression of asynchronous operations occurring on a compute node, such as MPI communications or CUDA/HIP streams. Yet, specifications only come with the task detach(event) API to suspend tasks until an asynchronous operation is completed, which presents a few drawbacks. In this paper, we introduce the design and implementation of an extension on the taskwait construct to suspend a task until an asynchronous event completion. It aims to reduce runtime costs induced by the current solution, and to provide a standard API to automate portable task suspension solutions. The results show twice less overheads compared to the existing task detach clause.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing OpenMP Tasking Model: Performance and Portability

Approaches for Task Affinity in OpenMP

ALPI: Enhancing Portability and Interoperability of Task-Aware Libraries

Notes

References

Bak, S., et al.: OpenMP application experiences: porting to accelerated nodes. Parallel Comput. 109, 102856 (2022). https://doi.org/10.1016/j.parco.2021.102856
Article Google Scholar
Carbonneaux, Q., Hoffmann, J., Ramananandro, T., Shao, Z.: End-to-End Verification of Stack-Space Bounds for C Programs. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2014, New York, NY, USA, pp. 270–281. Association for Computing Machinery (2014). https://doi.org/10.1145/2594291.2594301
Ferat, M., Pereira, R., Roussel, A., Carribault, P., Steffenel, L.A., Gautier, T.: Enhancing MPI+OpenMP task based applications for heterogeneous architectures with GPU Support. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds.) OpenMP in a Modern World: From Multi-device Support to Meta Programming, pp. 3–16. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15922-0_1
Chapter Google Scholar
Grospellier, G., Lelandais, B.: The Arcane Development Framework. In: Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing. POOSC 2009, New York, NY, USA. Association for Computing Machinery (2009). https://doi.org/10.1145/1595655.1595659
Iwasaki, S., Amer, A., Taura, K., Seo, S., Balaji, P.: BOLT: optimizing OpenMP parallel regions with user-level threads. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 29–42 (2019). https://doi.org/10.1109/PACT.2019.00011
Kale, V., Lu, W., Curtis, A., Malik, A.M., Chapman, B., Hernandez, O.: Toward supporting multi-GPU targets via taskloop and user-defined schedules. In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 295–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_19
Chapter Google Scholar
Karlin, I.: LULESH programming model and performance ports overview. Technical report, December 2012. https://doi.org/10.2172/1059462
Klabnik, S., Nichols, C.: The Rust Programming Language. No Starch Press, USA (2018)
Google Scholar
Lattner, C., et al.: MLIR: Scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Lelandais, B., Oudot, M.P., Combemale, B.: Fostering metamodels and grammars within a dedicated environment for HPC: the NabLab environment (Tool Demo). In: Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. SLE 2018, New York, NY, USA, pp. 200–204. Association for Computing Machinery (2018). https://doi.org/10.1145/3276604.3276620
Louboutin, M., et al.: Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geosci. Model Dev. 12(3), 1165–1187 (2019). https://doi.org/10.5194/gmd-12-1165-2019
Article Google Scholar
Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 444–454 (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82
Luporini, F., et al.: Architecture and performance of devito, a system for automated stencil computation. ACM Trans. Math. Softw. 46(1) (2020). https://doi.org/10.1145/3374916
Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a Lattice QCD benchmark. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_6
Chapter Google Scholar
Murai, H., Nakao, M., Sato, M.: XcalableMP programming model and language. In: Sato, M. (ed.) XcalableMP PGAS Programming Language, pp. 1–71. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7683-6_1
Chapter Google Scholar
Pereira, R., Roussel, A., Carribault, P., Gautier, T.: Communication-aware task scheduling strategy in hybrid MPI+OpenMP applications. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 197–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_14
Chapter Google Scholar
Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 809–818 (2017). https://doi.org/10.1109/IPDPS.2017.69
Protze, J., Hermanns, M.A., Demiralp, A., Müller, M.S., Kuhlen, T.: MPI detach - asynchronous local completion. In: Proceedings of the 27th European MPI Users’ Group Meeting. EuroMPI/USA 2020, New York, NY, USA, pp. 71–80. Association for Computing Machinery (2020). https://doi.org/10.1145/3416315.3416323
Richard, J., Latu, G., Bigot, J., Gautier, T.: Fine-Grained MPI+OpenMP plasma simulations: communication overlap with dependent tasks. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 419–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_30
Chapter Google Scholar
Sala, K., Teruel, X., Perez, J.M., Peña, A.J., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008
Article Google Scholar
Schuchart, J., Samfass, P., Niethammer, C., Gracia, J., Bosilca, G.: Callback-based completion notification using MPI Continuations. Parallel Comput. 106, 102793 (2021). https://doi.org/10.1016/j.parco.2021.102793
Article Google Scholar
Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1
Chapter Google Scholar
Tian, S., Doerfert, J., Chapman, B.: Concurrent execution of deferred OpenMP target tasks with hidden helper threads. In: Chapman, B., Moreira, J. (eds.) Languages and Compilers for Parallel Computing, pp. 41–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95953-1_4
Chapter Google Scholar
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283
Article MathSciNet Google Scholar
Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014). https://doi.org/10.1109/FPL.2014.6927483

Download references

Acknowledgments

This preprint has not undergone peer review (when applicable) or any post-submission improvements or correction. The Version of Record of this contribution is published in IWOMP 2023 and is available online at https://doi.org/<DOI>

Author information

Authors and Affiliations

CEA, DAM, DIF, 91297, Arpajon, France
Romain Pereira, Maël Martin, Adrien Roussel & Patrick Carribault
Université Paris-Saclay, CEA, Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, 91680, Bruyères-le-Châtel, France
Maël Martin, Adrien Roussel & Patrick Carribault
Project Team AVALON INRIA, LIP, ENS-Lyon, Lyon, France
Romain Pereira & Thierry Gautier

Authors

Romain Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Maël Martin
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Roussel
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Gautier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Romain Pereira .

Editor information

Editors and Affiliations

University of Bristol, Bristol, UK
Simon McIntosh-Smith
OpenMP ARB, Beaverton, OR, USA
Michael Klemm
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
University of Bristol, Bristol, UK
Tom Deakin
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pereira, R., Martin, M., Roussel, A., Carribault, P., Gautier, T. (2023). Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct. In: McIntosh-Smith, S., Klemm, M., de Supinski, B.R., Deakin, T., Klinkenberg, J. (eds) OpenMP: Advanced Task-Based, Device and Compiler Programming. IWOMP 2023. Lecture Notes in Computer Science, vol 14114. Springer, Cham. https://doi.org/10.1007/978-3-031-40744-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-40744-4_5
Published: 01 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40743-7
Online ISBN: 978-3-031-40744-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct