Locality-Aware Scheduling of Independent Tasks for Runtime Systems

Gonthier, Maxime; Marchal, Loris; Thibault, Samuel

doi:10.1007/978-3-031-06156-1_1

Maxime Gonthier¹⁸,
Loris Marchal¹⁸ &
Samuel Thibault¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13098))

Included in the following conference series:

European Conference on Parallel Processing

960 Accesses
1 Citations

Abstract

A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or other accelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memory through a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary data movements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms. When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well as their input data dependencies. Hence, it is able to order tasks and prefetch their input data in the GPU memory (after possibly evicting some previously-loaded data), while aiming at minimizing data movements, so as to reduce the total processing time. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwise independent) on a GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show that ordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem, and propose a new one based on task aggregation. These strategies have been implemented in the StarPU runtime system. We present their performance on tasks from tiled 2D and 3D matrix products. We present their performance on tasks from tiled 2D, 3D matrix products. Our experiments demonstrate that using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the total processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Locality-Aware Task-Parallel Execution on GPUs

Programming Heterogeneous Architectures Using Hierarchical Tasks

Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing

Notes

1.
The code used to reproducibly obtain the results of this paper is available at https://gitlab.inria.fr/starpu/locality-aware-scheduling/-/tree/coloc2021.

References

Augonnet, C., Clet-Ortega, J., Thibault, S., Namyst, R.: Data-aware task scheduling on multi-accelerator based platforms. In: 16th International Conference on Parallel and Distributed Systems, Shangai, China, December 2010
Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. Special Issue: Euro-Par 2009 23 (2011). https://doi.org/10.1002/cpe.1631
Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2) (1966). https://doi.org/10.1147/sj.52.0078
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.: PaRSEC: a programming paradigm exploiting heterogeneity for enhancing scalability. Comput. Sci. Eng. 15(6), 36–45 (2013). https://doi.org/10.1109/MCSE.2013.98
Article Google Scholar
Casanova, H., Giersch, A., Legrand, A., Quinson, M., Suter, F.: Versatile, scalable, and accurate simulation of distributed applications and platforms. J. Parallel Distrib. Comput. 74(10), 2899–2917 (2014)
Google Scholar
Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. In: Proceedings of the 1969 24th National Conference. ACM (1969). https://doi.org/10.1145/800195.805928
Denning, P.J.: The working set model for program behavior. Commun. ACM 11(5), 323–333 (1968)
Article MathSciNet Google Scholar
Gonthier, M., Marchal, L., Thibault, S.: Locality-aware scheduling of independant tasks for runtime systems. Research report, Inria (2021). https://hal.inria.fr/hal-03144290
Kaya, K., Uçar, B., Aykanat, C.: Heuristics for scheduling file-sharing tasks on heterogeneous systems with distributed repositories. J. Parallel Distributed Comput. 67(3) (2007). https://doi.org/10.1016/j.jpdc.2006.11.004
Yoo, R.M., Hughes, C.J., Kim, C., Chen, Y.K., Kozyrakis, C.: Locality-aware task management for unstructured parallelism: a quantitative limit study. In: ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (2013). https://doi.org/10.1145/2486159.2486175

Download references

Acknowledgement

This work was supported by the SOLHARIS project (ANR-19-CE46-0009) which is operated by the French National Research Agency (ANR).

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

Author information

Authors and Affiliations

LIP, CNRS, ENS de Lyon, Inria & Université Claude-Bernard Lyon 1, Lyon, France
Maxime Gonthier & Loris Marchal
LaBRI, University of Bordeaux, CNRS, Inria Bordeaux - Sud-Ouest, Talence, France
Samuel Thibault

Authors

Maxime Gonthier
View author publications
You can also search for this author in PubMed Google Scholar
Loris Marchal
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Thibault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Maxime Gonthier , Loris Marchal or Samuel Thibault .

Editor information

Editors and Affiliations

University of Lisbon, Lisbon, Portugal
Ricardo Chaves
Department of Computer Engineering, CiTIUS, University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora B. Heras
University of Lisbon, Lisbon, Portugal
Aleksandar Ilic
Koç University, Istanbul, Turkey
Didem Unat
Barcelona Supercomputing Center, Barcelona, Spain
Rosa M. Badia
University of Stirling, Stirling, UK
Andrea Bracciali
Louisiana State University, Baton Rouge, USA
Patrick Diehl
Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA
Anshu Dubey
Ajou University, Suwon, Korea (Republic of)
Oh Sangyoon
Tennessee Technological University, Cookeville, TN, USA
Stephen L. Scott
University of Pisa, Pisa, Italy
Laura Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonthier, M., Marchal, L., Thibault, S. (2022). Locality-Aware Scheduling of Independent Tasks for Runtime Systems. In: Chaves, R., et al. Euro-Par 2021: Parallel Processing Workshops. Euro-Par 2021. Lecture Notes in Computer Science, vol 13098. Springer, Cham. https://doi.org/10.1007/978-3-031-06156-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-06156-1_1
Published: 09 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06155-4
Online ISBN: 978-3-031-06156-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Locality-Aware Scheduling of Independent Tasks for Runtime Systems