Abstract
Many-core and heterogeneous architectures now require programmers to compose multiple asynchronous programming model to fully exploit hardware capabilities. As a shared-memory parallel programming model, OpenMP has the responsibility of orchestrating the suspension and progression of asynchronous operations occurring on a compute node, such as MPI communications or CUDA/HIP streams. Yet, specifications only come with the task detach(event) API to suspend tasks until an asynchronous operation is completed, which presents a few drawbacks. In this paper, we introduce the design and implementation of an extension on the taskwait construct to suspend a task until an asynchronous event completion. It aims to reduce runtime costs induced by the current solution, and to provide a standard API to automate portable task suspension solutions. The results show twice less overheads compared to the existing task detach clause.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
References
Bak, S., et al.: OpenMP application experiences: porting to accelerated nodes. Parallel Comput. 109, 102856 (2022). https://doi.org/10.1016/j.parco.2021.102856
Carbonneaux, Q., Hoffmann, J., Ramananandro, T., Shao, Z.: End-to-End Verification of Stack-Space Bounds for C Programs. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2014, New York, NY, USA, pp. 270–281. Association for Computing Machinery (2014). https://doi.org/10.1145/2594291.2594301
Ferat, M., Pereira, R., Roussel, A., Carribault, P., Steffenel, L.A., Gautier, T.: Enhancing MPI+OpenMP task based applications for heterogeneous architectures with GPU Support. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds.) OpenMP in a Modern World: From Multi-device Support to Meta Programming, pp. 3–16. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15922-0_1
Grospellier, G., Lelandais, B.: The Arcane Development Framework. In: Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing. POOSC 2009, New York, NY, USA. Association for Computing Machinery (2009). https://doi.org/10.1145/1595655.1595659
Iwasaki, S., Amer, A., Taura, K., Seo, S., Balaji, P.: BOLT: optimizing OpenMP parallel regions with user-level threads. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 29–42 (2019). https://doi.org/10.1109/PACT.2019.00011
Kale, V., Lu, W., Curtis, A., Malik, A.M., Chapman, B., Hernandez, O.: Toward supporting multi-GPU targets via taskloop and user-defined schedules. In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 295–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_19
Karlin, I.: LULESH programming model and performance ports overview. Technical report, December 2012. https://doi.org/10.2172/1059462
Klabnik, S., Nichols, C.: The Rust Programming Language. No Starch Press, USA (2018)
Lattner, C., et al.: MLIR: Scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Lelandais, B., Oudot, M.P., Combemale, B.: Fostering metamodels and grammars within a dedicated environment for HPC: the NabLab environment (Tool Demo). In: Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. SLE 2018, New York, NY, USA, pp. 200–204. Association for Computing Machinery (2018). https://doi.org/10.1145/3276604.3276620
Louboutin, M., et al.: Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geosci. Model Dev. 12(3), 1165–1187 (2019). https://doi.org/10.5194/gmd-12-1165-2019
Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 444–454 (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82
Luporini, F., et al.: Architecture and performance of devito, a system for automated stencil computation. ACM Trans. Math. Softw. 46(1) (2020). https://doi.org/10.1145/3374916
Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a Lattice QCD benchmark. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_6
Murai, H., Nakao, M., Sato, M.: XcalableMP programming model and language. In: Sato, M. (ed.) XcalableMP PGAS Programming Language, pp. 1–71. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7683-6_1
Pereira, R., Roussel, A., Carribault, P., Gautier, T.: Communication-aware task scheduling strategy in hybrid MPI+OpenMP applications. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 197–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_14
Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 809–818 (2017). https://doi.org/10.1109/IPDPS.2017.69
Protze, J., Hermanns, M.A., Demiralp, A., Müller, M.S., Kuhlen, T.: MPI detach - asynchronous local completion. In: Proceedings of the 27th European MPI Users’ Group Meeting. EuroMPI/USA 2020, New York, NY, USA, pp. 71–80. Association for Computing Machinery (2020). https://doi.org/10.1145/3416315.3416323
Richard, J., Latu, G., Bigot, J., Gautier, T.: Fine-Grained MPI+OpenMP plasma simulations: communication overlap with dependent tasks. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 419–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_30
Sala, K., Teruel, X., Perez, J.M., Peña, A.J., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008
Schuchart, J., Samfass, P., Niethammer, C., Gracia, J., Bosilca, G.: Callback-based completion notification using MPI Continuations. Parallel Comput. 106, 102793 (2021). https://doi.org/10.1016/j.parco.2021.102793
Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1
Tian, S., Doerfert, J., Chapman, B.: Concurrent execution of deferred OpenMP target tasks with hidden helper threads. In: Chapman, B., Moreira, J. (eds.) Languages and Compilers for Parallel Computing, pp. 41–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95953-1_4
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283
Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014). https://doi.org/10.1109/FPL.2014.6927483
Acknowledgments
This preprint has not undergone peer review (when applicable) or any post-submission improvements or correction. The Version of Record of this contribution is published in IWOMP 2023 and is available online at https://doi.org/<DOI>
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pereira, R., Martin, M., Roussel, A., Carribault, P., Gautier, T. (2023). Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct. In: McIntosh-Smith, S., Klemm, M., de Supinski, B.R., Deakin, T., Klinkenberg, J. (eds) OpenMP: Advanced Task-Based, Device and Compiler Programming. IWOMP 2023. Lecture Notes in Computer Science, vol 14114. Springer, Cham. https://doi.org/10.1007/978-3-031-40744-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-40744-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40743-7
Online ISBN: 978-3-031-40744-4
eBook Packages: Computer ScienceComputer Science (R0)