Skip to main content

Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct

  • Conference paper
  • First Online:
OpenMP: Advanced Task-Based, Device and Compiler Programming (IWOMP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14114))

Included in the following conference series:

  • 445 Accesses

Abstract

Many-core and heterogeneous architectures now require programmers to compose multiple asynchronous programming model to fully exploit hardware capabilities. As a shared-memory parallel programming model, OpenMP has the responsibility of orchestrating the suspension and progression of asynchronous operations occurring on a compute node, such as MPI communications or CUDA/HIP streams. Yet, specifications only come with the task detach(event) API to suspend tasks until an asynchronous operation is completed, which presents a few drawbacks. In this paper, we introduce the design and implementation of an extension on the taskwait construct to suspend a task until an asynchronous event completion. It aims to reduce runtime costs induced by the current solution, and to provide a standard API to automate portable task suspension solutions. The results show twice less overheads compared to the existing task detach clause.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.top500.org/lists/top500.

  2. 2.

    https://gitlab.inria.fr/ropereir/iwomp23.

  3. 3.

    https://github.com/mpiwg-hybrid/hybrid-issues/issues/6.

  4. 4.

    https://github.com/devreal/ompi/tree/mpi-continue-master.

  5. 5.

    https://github.com/RWTH-HPC/mpi-detach/blob/master/detach.cpp#L66.

  6. 6.

    https://gitlab.inria.fr/ropereir/iwomp23/-/blob/main/bench/taskwait-detach.c.

  7. 7.

    https://github.com/llvm/llvm-project/issues/61499.

  8. 8.

    https://github.com/rust-lang/rfcs/blob/master/text/0230-remove-runtime.md.

  9. 9.

    https://github.com/rust-lang/rfcs/blob/master/text/2033-experimental-coroutines.md.

  10. 10.

    https://doc.rust-lang.org/std/ops/trait.Generator.html.

References

  1. Bak, S., et al.: OpenMP application experiences: porting to accelerated nodes. Parallel Comput. 109, 102856 (2022). https://doi.org/10.1016/j.parco.2021.102856

    Article  Google Scholar 

  2. Carbonneaux, Q., Hoffmann, J., Ramananandro, T., Shao, Z.: End-to-End Verification of Stack-Space Bounds for C Programs. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI 2014, New York, NY, USA, pp. 270–281. Association for Computing Machinery (2014). https://doi.org/10.1145/2594291.2594301

  3. Ferat, M., Pereira, R., Roussel, A., Carribault, P., Steffenel, L.A., Gautier, T.: Enhancing MPI+OpenMP task based applications for heterogeneous architectures with GPU Support. In: Klemm, M., de Supinski, B.R., Klinkenberg, J., Neth, B. (eds.) OpenMP in a Modern World: From Multi-device Support to Meta Programming, pp. 3–16. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15922-0_1

    Chapter  Google Scholar 

  4. Grospellier, G., Lelandais, B.: The Arcane Development Framework. In: Proceedings of the 8th Workshop on Parallel/High-Performance Object-Oriented Scientific Computing. POOSC 2009, New York, NY, USA. Association for Computing Machinery (2009). https://doi.org/10.1145/1595655.1595659

  5. Iwasaki, S., Amer, A., Taura, K., Seo, S., Balaji, P.: BOLT: optimizing OpenMP parallel regions with user-level threads. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 29–42 (2019). https://doi.org/10.1109/PACT.2019.00011

  6. Kale, V., Lu, W., Curtis, A., Malik, A.M., Chapman, B., Hernandez, O.: Toward supporting multi-GPU targets via taskloop and user-defined schedules. In: Milfeld, K., de Supinski, B.R., Koesterke, L., Klinkenberg, J. (eds.) IWOMP 2020. LNCS, vol. 12295, pp. 295–309. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58144-2_19

    Chapter  Google Scholar 

  7. Karlin, I.: LULESH programming model and performance ports overview. Technical report, December 2012. https://doi.org/10.2172/1059462

  8. Klabnik, S., Nichols, C.: The Rust Programming Language. No Starch Press, USA (2018)

    Google Scholar 

  9. Lattner, C., et al.: MLIR: Scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14 (2021). https://doi.org/10.1109/CGO51591.2021.9370308

  10. Lelandais, B., Oudot, M.P., Combemale, B.: Fostering metamodels and grammars within a dedicated environment for HPC: the NabLab environment (Tool Demo). In: Proceedings of the 11th ACM SIGPLAN International Conference on Software Language Engineering. SLE 2018, New York, NY, USA, pp. 200–204. Association for Computing Machinery (2018). https://doi.org/10.1145/3276604.3276620

  11. Louboutin, M., et al.: Devito (v3.1.0): an embedded domain-specific language for finite differences and geophysical exploration. Geosci. Model Dev. 12(3), 1165–1187 (2019). https://doi.org/10.5194/gmd-12-1165-2019

    Article  Google Scholar 

  12. Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 444–454 (2015). https://doi.org/10.1109/HPCC-CSS-ICESS.2015.82

  13. Luporini, F., et al.: Architecture and performance of devito, a system for automated stencil computation. ACM Trans. Math. Softw. 46(1) (2020). https://doi.org/10.1145/3374916

  14. Meadows, L., Ishikawa, K.: OpenMP tasking and MPI in a Lattice QCD benchmark. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 77–91. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_6

    Chapter  Google Scholar 

  15. Murai, H., Nakao, M., Sato, M.: XcalableMP programming model and language. In: Sato, M. (ed.) XcalableMP PGAS Programming Language, pp. 1–71. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7683-6_1

    Chapter  Google Scholar 

  16. Pereira, R., Roussel, A., Carribault, P., Gautier, T.: Communication-aware task scheduling strategy in hybrid MPI+OpenMP applications. In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 197–210. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_14

    Chapter  Google Scholar 

  17. Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 809–818 (2017). https://doi.org/10.1109/IPDPS.2017.69

  18. Protze, J., Hermanns, M.A., Demiralp, A., Müller, M.S., Kuhlen, T.: MPI detach - asynchronous local completion. In: Proceedings of the 27th European MPI Users’ Group Meeting. EuroMPI/USA 2020, New York, NY, USA, pp. 71–80. Association for Computing Machinery (2020). https://doi.org/10.1145/3416315.3416323

  19. Richard, J., Latu, G., Bigot, J., Gautier, T.: Fine-Grained MPI+OpenMP plasma simulations: communication overlap with dependent tasks. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 419–433. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_30

    Chapter  Google Scholar 

  20. Sala, K., Teruel, X., Perez, J.M., Peña, A.J., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008

    Article  Google Scholar 

  21. Schuchart, J., Samfass, P., Niethammer, C., Gracia, J., Bosilca, G.: Callback-based completion notification using MPI Continuations. Parallel Comput. 106, 102793 (2021). https://doi.org/10.1016/j.parco.2021.102793

    Article  Google Scholar 

  22. Schuchart, J., Tsugane, K., Gracia, J., Sato, M.: The impact of taskyield on the design of tasks communicating through MPI. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 3–17. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_1

    Chapter  Google Scholar 

  23. Tian, S., Doerfert, J., Chapman, B.: Concurrent execution of deferred OpenMP target tasks with hidden helper threads. In: Chapman, B., Moreira, J. (eds.) Languages and Compilers for Parallel Computing, pp. 41–56. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95953-1_4

    Chapter  Google Scholar 

  24. Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283

    Article  MathSciNet  Google Scholar 

  25. Véstias, M., Neto, H.: Trends of CPU, GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–6 (2014). https://doi.org/10.1109/FPL.2014.6927483

Download references

Acknowledgments

This preprint has not undergone peer review (when applicable) or any post-submission improvements or correction. The Version of Record of this contribution is published in IWOMP 2023 and is available online at https://doi.org/<DOI>

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romain Pereira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pereira, R., Martin, M., Roussel, A., Carribault, P., Gautier, T. (2023). Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct. In: McIntosh-Smith, S., Klemm, M., de Supinski, B.R., Deakin, T., Klinkenberg, J. (eds) OpenMP: Advanced Task-Based, Device and Compiler Programming. IWOMP 2023. Lecture Notes in Computer Science, vol 14114. Springer, Cham. https://doi.org/10.1007/978-3-031-40744-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40744-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40743-7

  • Online ISBN: 978-3-031-40744-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics