Abstract
State-of-the-art programming approaches generally have a strict division between intra-node shared memory parallelism and inter-node MPI communication. Tasking with dependencies offers a clean, dependable abstraction for a wide range of hardware and situations within a node, but research on task offloading between nodes is still relatively immature. This paper presents a flexible task offloading extension of the OmpSs-2 programming model, which inherits task ordering from a sequential version of the code and uses a common address space to avoid address translation and simplify the use of data structures with pointers. It uses weak dependencies to enable work to be created concurrently. The program is executed in distributed dataflow fashion, and the runtime system overlaps the construction of the distributed dependency graph, enforces dependencies, transfers data, and schedules tasks for execution. Asynchronous task parallelism avoids synchronization that is often required in MPI+OpenMP tasks. Task scheduling is flexible, and data location is tracked through the dependencies. We wish to enable future work in resiliency, scalability, load balancing and malleability, and therefore release all source code and examples open source.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Currently nowait is available through a Nanos6 API call.
References
Aguilar Mena, J., Shaaban, O., Beltran, V., Carpenter, P., Ayguade, E., Labarta Mancho, J.: Artifact and instructions to generate experimental results for the Euro-Par 2022 paper: “OmpSs-2@Cluster: Distributed memory execution of nested OpenMP-style tasks” (2022). https://doi.org/10.6084/m9.figshare.19960721
Álvarez, D., Sala, K., Maroñas, M., Roca, A., Beltran, V.: Advanced synchronization techniques for task-based runtime systems, pp. 334–347. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3437801.3441601
Augonnet, C., Aumage, O., Furmento, N., Namyst, R., Thibault, S.: StarPU-MPI: task programming over clusters of machines enhanced with accelerators. In: European MPI Users’ Group Meeting, pp. 298–299. Springer, Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-642-33518-1_40
Barcelona Supercomputing Center: MareNostrum 4 (2017) System Architecture (2017). https://www.bsc.es/marenostrum/marenostrum/technical-information
Barcelona Supercomputing Center: Mercurium (2021). https://pm.bsc.es/mcxx
Barcelona Supercomputing Center: Nanos6 (2021). https://github.com/bsc-pm/nanos6
Barcelona Supercomputing Center: OmpSs-2 releases (2021). https://github.com/bsc-pm/ompss-releases
Barcelona Supercomputing Center: OmpSs-2 specification (2021). https://pm.bsc.es/ftp/ompss-2/doc/spec/
Barcelona Supercomputing Center: OmpSs-2@Cluster releases (2022). https://github.com/bsc-pm/ompss-2-cluster-releases
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012). https://doi.org/10.1109/SC.2012.71
Beckingsale, D.A., et al.: Raja: portable performance for large-scale scientific applications. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (2019). https://doi.org/10.1109/P3HPC49587.2019.00012
Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37 (1999). https://doi.org/10.1006/jpdc.1996.0107
Bueno, J., et al.: Productive programming of GPU clusters with OmpSs. In: IEEE 26th International Parallel and Distributed Processing Symposium (2012). https://doi.org/10.1109/IPDPS.2012.58
Charles, P., et al.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 519–538. OOPSLA 2005. Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1094811.1094852
Deelman, E., et al.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Programm. 13(3), 219–237 (2005). https://doi.org/10.1155/2005/128026
Faxén, K.F.: Wool user’s guide. Technical report, Swedish Institute of Computer Science (2009)
Fürlinger, K., et al.: DASH: distributed data structures and parallel algorithms in a global address space. In: Software for Exascale Computing-SPPEXA 2016–2019, pp. 103–142. Springer International Publishing (2020). https://doi.org/10.1007/978-3-030-47956-5_6
Hoque, R., Herault, T., Bosilca, G., Dongarra, J.: Dynamic task discovery in parsec: a data-flow task-based runtime. In: Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 1–8 (2017). https://doi.org/10.1145/3148226.3148233
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: 8th International Conference on Partitioned Global Address Space Programming Models (2014). https://doi.org/10.13140/2.1.2635.5204
Klinkenberg, J., Samfass, P., Bader, M., Terboven, C., Müller, M.: CHAMELEON: reactive load balancing for hybrid MPI+OpenMP task-parallel applications. J. Parallel Distrib. Comput. 138 (2019). https://doi.org/10.1016/j.jpdc.2019.12.005
Leiserson, C.E.: The Cilk++ concurrency platform. J. Supercomput. 51(3), 244–257 (2010). https://doi.org/10.1007/s11227-010-0405-3
Lordan, F., et al.: ServiceSs: an interoperable programming framework for the cloud. J. Grid Comput. 12(1), 67–91 (2013). https://doi.org/10.1007/s10723-013-9272-5
OpenMP Architecture Review Board: OpenMP 4.0 complete specifications, July 2013
Papakonstantinou, N., Zakkak, F.S., Pratikakis, P.: Hierarchical parallel dynamic dependence analysis for recursively task-parallel programs. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 933–942 (2016). https://doi.org/10.1109/IPDPS.2016.53
Parallel Programming Lab, Dept of Computer Science, U.o.I.: Charm++ documentation. https://charm.readthedocs.io/en/latest/index.html
Perez, J.M., Beltran, V., Labarta, J., Ayguadé, E.: Improving the integration of task nesting and dependencies in OpenMP. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 809–818 (2017). https://doi.org/10.1109/IPDPS.2017.69
Pérez, J., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, pp. 142–151 (2008). https://doi.org/10.1109/CLUSTR.2008.4663765
Rotaru, T., Rahn, M., Pfreundt, F.J.: MapReduce in GPI-Space. In: Euro-Par 2013: Parallel Processing Workshops, pp. 43–52. Springer, Berlin, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54420-0_5
Sala, K., Macià, S., Beltran, V.: Combining one-sided communications with task-based programming models. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 528–541 (2021). https://doi.org/10.1109/Cluster48925.2021.00024
Sala, K., Teruel, X., Perez, J.M., Peña, A.J., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008
Sergent, M., Goudin, D., Thibault, S., Aumage, O.: Controlling the memory subscription of distributed applications with a task-based runtime system. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 318–327 (2016). https://doi.org/10.1109/IPDPSW.2016.105
Tillenius, M.: SuperGlue: a shared memory framework using data versioning for dependency-aware task-based parallelization. SIAM J. Scient. Comput. 37(6) (2015). https://doi.org/10.1137/140989716
Tzenakis, G., Papatriantafyllou, A., Kesapides, J., Pratikakis, P., Vandierendonck, H., Nikolopoulos, D.S.: BDDT: Block-level dynamic dependence analysis for deterministic task-based parallelism. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP 2012, vol. 47, pp. 301–302. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2145816.2145864
Virouleau, P., Broquedis, F., Gautier, T., Rastello, F.: Using data dependencies to improve task-based scheduling strategies on NUMA architectures. In: European Conference on Parallel Processing, pp. 531–544. Springer (2016). https://doi.org/10.1007/978-3-319-43659-3_39
Zafari, A., Larsson, E., Tillenius, M.: DuctTeip: an efficient programming model for distributed task-based parallel computing. Parallel Comput. (2019). https://doi.org/10.1016/j.parco.2019.102582
Acknowledgements and Data Availability
The datasets and code generated during and/or analyzed during the current study are available in the Figshare repository: [1]. This research has received funding from the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No 955606 (DEEP-SEA) and 754337 (EuroEXA). It is supported by the Spanish State Research Agency - Ministry of Science and Innovation (contract PID2019-107255GB and Ramon y Cajal fellowship RYC2018-025628-I) and by the Generalitat de Catalunya (2017-SGR-1414).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Aguilar Mena, J., Shaaban, O., Beltran, V., Carpenter, P., Ayguade, E., Labarta Mancho, J. (2022). OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks. In: Cano, J., Trinder, P. (eds) Euro-Par 2022: Parallel Processing. Euro-Par 2022. Lecture Notes in Computer Science, vol 13440. Springer, Cham. https://doi.org/10.1007/978-3-031-12597-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-12597-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12596-6
Online ISBN: 978-3-031-12597-3
eBook Packages: Computer ScienceComputer Science (R0)