Abstract
In the past decade, heterogeneous multicore architectures with support for Single Instruction Multiple Thread (SIMT) style computing have become the standard platform of choice for scheduling HPC applications. Here, applications are typically modelled as a set of data-parallel tasks with dependencies represented in the form of a directed acyclic graph (DAG). The relevant execution time information for each constituent task in the DAG is known beforehand and is leveraged by scheduling algorithms (List or Cluster based) to ascertain near-optimal schedules at runtime. However, given an online setting, where applications are submitted by multiple users and the types of applications are not restrictive, the chances of knowing execution time information for every program are highly unlikely. In this context, we propose a class of intelligent algorithms for heterogeneous CPU-GPU platforms that leverage static analysis-assisted machine learning techniques for deciding how device assignments should be made at runtime, thus bypassing the requirement for expensive offline profiling passes. We formalize relevant task-level ranking metrics and discuss how existing scheduling techniques can be adapted for our proposed class of algorithms. We also devise an online cluster scheduling algorithm that supports dynamic task arrival by determining in any given scheduling epoch, mapping decisions for a subset of tasks in a DAG. We perform a detailed comparative analysis between our proposed cluster and list scheduling heuristics via extensive simulation experiments using a variety of heterogeneous multicore platform configurations and observe performance speedups in the range of 1.1–1.5× for cluster scheduling over that of list scheduling.












Similar content being viewed by others
References
Arabnejad H, Barbosa J (2012) Fairness resource sharing for dynamic workflow scheduling on heterogeneous systems. In: ISPA, pp 633–639
Arabnejad H, Barbosa JG (2014) List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans Parallel Distrib Syst 25(3):682–694
Ashbaugh B, Bader A, Brodman J et al (2020) Data parallel c++: enhancing sycl through extensions for productivity and performance. In: IWOCL
Bittencourt LF, Sakellariou R, Madeira ERM (2010) Dag scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm. In: PDP, pp 27–34
Boeres C, Filho JV, Rebello VEF (2004) A cluster-based strategy for scheduling task on heterogeneous processors. In: SBAC-PAD, pp 214–221
Capodieci N, Cavicchioli R, Bertogna M, et al (2018) Deadline-based scheduling for gpu with preemption support. In: RTSS, IEEE, pp 119–130
Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority oversampling technique. J Artif Intell Res 16:321–357
Chingchit S, Kumar M, Bhuyan LN (1999) A flexible clustering and scheduling scheme for efficient parallel computation. In: IPPS/SPDP, pp 500–505
Cirou B, Jeannot E (2001) Triplet: a clustering scheduling algorithm for heterogeneous systems. In: ICPPW, pp 231–236
Cordeiro D, Mounié G, Perarnau S et al (2010) Random graph generation for scheduling simulations. In: ICST, SIMUTools ’10, pp 1–10
Ghose A, Dey S, Mitra P et al (2016) Divergence aware automated partitioning of opencl workloads. In: ISEC, pp 131–135
Ghose A, Dokara L, Dey S et al (2017) A framework for opencl task scheduling on heterogeneous multicores. Parallel Process Lett 27(3–4):1–32
Ghose A, Maity S, Kar A et al (2021) Orchestration of perception systems for reliable performance in heterogeneous platforms. In: DATE, pp 1757–1762
Grewe D, O’Boyle MF (2011) A static task partitioning approach for heterogeneous systems using opencl. In: CC, Springer, pp 286–305
Grewe D, Wang Z, O’Boyle MF (2013) Opencl task partitioning in the presence of gpu contention. In: LCPC, Springer, pp 87–101
Hagras T, Janecek J (2003) A simple scheduling heuristic for heterogeneous computing environments. In: SPDP, pp 104–110
Hsu CC, Huang KC, Wang FJ (2010) Online scheduling of workflow applications in grid environment. In: GPC, pp 300–310
Ijaz S, Munir EU (2019) Mopt: list-based heuristic for scheduling workflows in cloud environment. J Supercomput 75(7):3740–3768
Ilavarasan E, Thambidurai P (2007) Low complexity performance effective task scheduling algorithm for heterogeneous computing environments. J Comput Sci 3(2):94–103
Ilavarasan E, Thambidurai P, Mahilmannan R (2005) High performance task scheduling algorithm for heterogeneous computing system. In: ICA3PP. Springer, pp 193–203
Jedari B, Dehghan M (2009) Efficient dag scheduling with resource-aware clustering for heterogeneous systems. In: Computers and Information Science, pp 249–261
Kanemitsu H, Lee G, Nakazato H et al (2011) A processor mapping strategy for processor utilization in a heterogeneous distributed system. J Comput 3(11):1–8
Kanemitsu H, Hanada M, Nakazato H (2016) Clustering-based task scheduling in a large number of heterogeneous processors. IEEE Trans Parallel Distrib Syst 27(11):3144–3157
Kang W, Lee K, Lee J et al (2021) Lalarand: Flexible layer-by-layer cpu/gpu scheduling for real-time dnn tasks. In: RTSS, IEEE, pp 329–341
Khalid YN, Aleem M, Prodan R et al (2018) E-osched: a load balancing scheduler for heterogeneous multicores. J Supercomput 74(10):5399–5431
Kofler K, Grasso I, Cosenza B et al (2013) An automatic input-sensitive approach for heterogeneous task partitioning. In: SC, ACM, pp 149–160
Lattner C, Adve V (2004) Llvm: a compilation framework for lifelong program analysis & transformation. In: CGO, p 75
Liu K, Chen J, Jin H et al (2009) A min-min average algorithm for scheduling transaction-intensive grid workflows. In: AusGrid, Australian Computer Society, Inc., pp 41–48
NVIDIA (2007) Opencl computing sdk. https://developer.nvidia.com/opencl
NVIDIA, Vingelmann P, Fitzek FH (2020) Cuda, release: 10.2.89. https://developer.nvidia.com/cuda-toolkit
Pouchet LN (2012) Polybench benchmark suite. https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/
Sakellariou R, Zhao H (2004) A hybrid heuristic for dag scheduling on heterogeneous systems. In: IPDPS, pp 111–123
Senapati D, Sarkar A, Karfa C (2021) Hmds: A makespan minimizing dag scheduler for heterogeneous distributed systems. ACM Trans Embed Comput Syst 20(5s)
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. MCSE 12(3):66
Topcuoglu H, Hariri S, Wu MY (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Wang H, Sinnen O (2018) List-scheduling vs. cluster-scheduling. IEEE Trans Parallel Distrib Syst, pp 1736–1749
Wen Y, Wang Z, O’Boyle MFP (2014) Smart multi-task scheduling for opencl programs on cpu/gpu heterogeneous platforms. In: HiPC, pp 1–10
Xiang Y, Kim H (2019) Pipelined data-parallel cpu/gpu scheduling for multi-dnn real-time inference. In: RTSS, IEEE, pp 392–405
Yu Z, Shi W (2008) A planner-guided scheduling strategy for multiple workflow applications. In: ICPPW, pp 1–8
Zhao H, Sakellariou R (2003) An experimental investigation into the rank function of the heterogeneous earliest finish time scheduling algorithm. In: Euro-Par 2003 Parallel Processing. Springer, pp 189–194
Zhao H, Sakellariou R (2006) Scheduling multiple dags onto heterogeneous systems. In: IPDPS, pp 14 – 28
Zhou H, Bateni S, Liu C (2018) S^ 3dnn: supervised streaming and scheduling for gpu-accelerated real-time dnn workloads. In: RTAS, IEEE, pp 190–201
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghose, A., Dey, S. FGFS: Feature Guided Frontier Scheduling for SIMT DAGs. J Supercomput 78, 11702–11743 (2022). https://doi.org/10.1007/s11227-022-04323-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04323-8