Real-Time scheduling and analysis of parallel tasks on heterogeneous multi-cores

https://doi.org/10.1016/j.sysarc.2019.101704Get rights and content

Abstract

Heterogeneous multi-cores and parallel architectures have recently gained much attention owing to utilizing the strength of different architectures for offering higher performance. In this paper, we study the real-time scheduling of the directed acyclic graph (DAG) tasks upon the heterogeneous multi-core platform, i.e., a task contains different types of vertices, and the workload of each vertex must execute on its particular type of cores. Traditional researches use the work-conserving scheduling strategy to schedule such a typed DAG task and lead to pessimistic schedulability tests. To this end, we propose a novel scheduling algorithm for typed DAG tasks, which assigns each vertex a varying criticality that depends on the remaining workload of the vertex, and moreover, the vertex with higher criticality is more urgent to be executed. Under this scheduling strategy, we propose a new worst-case response time (WCRT) bound to verify the schedulability of DAG task supporting heterogeneous computing. Experiments with randomly generated workload show that the accuracy of our new WCRT is about 20% higher on average than the existing bounds.

Introduction

To meet the increasing performance requirements, parallel hardware architectures have become the mainstream in the multi-cores embedded field. Parallel programming models are fundamental to exploit the performance capabilities of these architectures [1], [2], [3]. In recent years, parallel task scheduling problems with real-time constraints have made great progress [4]. Some researchers developed multiple parallel programming paradigms, such as MPI [5], OpenMP [6], [7] or parallel programming languages as CilkPlus [8] to aid developers in the creation of parallel programs. All these parallel programming paradigms currently support intra-task parallelism, where a single task consists of multiple parallel code parts that can be executed simultaneously. DAG task model is a promising model to formulate the intra-task parallelism software. The real-time scheduling and analysis of DAG parallel task model has gained a lot of attention in the real-time and High-Performance Computing communities [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].

Moreover, heterogeneous hardware architecture that can utilize specialized processing capabilities and can offer higher performance and energy efficiency than homogeneous architecture has received more and more attention [19], [20], [21], [22], [23]. In general, heterogeneous hardware architecture consists of equipment that is asymmetric in performance and functionality [24], [25] which integrate low power general purpose multi-cores (known as the host) with specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPUs), such as NVIDIA Tegra X1 [26] or Xilinx UltraScale [27]. Heterogeneous multiprocessor systems on a chip (MPSoCs) as one of heterogeneous hardware architecture has been widely used in many real-time embedded systems. As introduced in [28], [29], [30] MPSoCs can be broadly classified into performance heterogeneity and functional heterogeneity. Performance heterogeneity is that cores with the same functionality (i.e., same instruction set architecture (ISA)) but different power-performance characteristics are integrated. Functional heterogeneity is that cores with very different functionality (i.e., different ISA) are interspersed on the same die. Jetson TX2 [31] belongs to performance heterogeneity since it adopts the big. LITTLE architecture [32] that integrates high-performance cores (big cores) with low-power cores (LITTLE cores). It contains two Denver cores (high-performance cores) and four ARM Cortex-A57 cores (low-power cores). Denver cores and ARM cores have different power-performance characteristics but they are coherent and share the same ISA.

Current parallel programming languages tend to support heterogeneous multi-cores. For example, in OpenMP [6], the proc_bind clause can be used to specify a mapping of threads to some processing core. In CUDA [33], the cudaSetDevice function can be used to set the following execution to the target device. In OpenCL [34], the clCreateCommandQueue function can be used to create command queues for some devices.

In this paper, we consider real-time scheduling of typed DAG tasks on heterogeneous multi-cores, where each vertex is explicitly bound to a specific type of cores for execution. Binding code snippets of a program to a specific type of cores is a common operation in heterogeneous multi-cores scheduling and can be easily implemented in mainstream parallel programming frameworks and operating systems.

The real-time scheduling of typed DAG tasks under heterogeneous platforms is studied in [35], [36], [37]. All these work schedules the DAG tasks under the work-conserving algorithm, the response time analysis methods introduced as follows. Jeffrey et al [35] proposed the first WCRT bound for the general typed DAG task model. However, Jeffrey’s response time bound is very pessimistic. Serrano et al [36] proposed the response time bound for a specific typed DAG task model with two typed cores that has certain limitations. Han et al [37] developed two response time bounds in which the first bound dominated Jeffrey’s bound [35] in analysis precision and another bound significantly improved the analysis precision by exploring more detailed task graph structure information. Even Han’s response time bounds are still very pessimistic. When analyzing the worst response time of each path, it took into account some blocked time from vertices of the same type and in different paths that have completed or not yet executed, which is unnecessary. Yang et al. [38] studied the scheduling of typed DAG tasks by decomposing each DAG task into a set of independent subtasks with artificial release time and deadlines.

This paper aims to get a more accurate WCRT upper bound for typed DAG tasks. To solve the problems in the early work, we propose a criticality allocation strategy, which assigns a criticality to each vertex. The criticality determines the urgency of vertex execution and decreases as the remaining workload of the vertex decreases. Based on the strategy, we propose a new WCRT bound to verify the schedulability of the DAG task supporting heterogeneous computing. It can reduce the number of potentially parallel vertices to a relatively small range. Experiments with randomly generated workload show that our proposed criticality allocation strategy and new bound are significantly more precise than the existing bound.

The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 introduces the system model. Section 4 enumerates the known WCRT upper bounds of the same model with us and analyzes the problems in these bounds. Section 5 presents a new scheduling strategy for typed DAG tasks based on criticality allocation. Section 6 describes the response time analysis for our new scheduling strategy. Section 7 presents the experimental results comparing our approach with existing WCRT bounds. Finally, Section 8 concludes the paper highlighting some future research directions.

Section snippets

Related work

Parallel and heterogeneous hardware architectures become mainstream in the embedded real-time domain to cope with the increasing performance requirements. Scheduling of an application modeled by DAG which as fundamental in parallel programming models is a key problem when aiming at high performance. The classical response time bound for untyped DAG tasks was proposed by Graham [39] and Graham [40]. Based on [39], [40], the response time analysis for multiple untyped DAG (in which the vertices

Platform model

The heterogeneous multi-core platform consisting of S types of cores is formulated as a collection of cores C={C1,,CS}, where Cs (s ∈ [1, S]) is the set of the s-th type of cores. For the sake of convenience, we let ms be the cardinality of Cs, i.e., ms=|Cs|1

Task model

The parallel task is formulated as a typed DAG model G=(V,E,Γ,c), where V is the set of vertices, E

Existing WCRT bound

We briefly review some prior results from the response time analysis literature, that we will compare some of them with the response time bound which derived in this paper. On the known work for the considered model, the typed DAG task G is scheduled on the heterogeneous multi-core platform by a work-conserving scheduling algorithm, under which an eligible vertex of type s, that all of its predecessors have finished, must be executed if there are available cores of type s. It also applies to

A new scheduling algorithm for the typed DAG task

In the previous section, we analyzed the existing bounds and the problems within them. From the bounds we can know that the response time calculation of each path l is divided into two parts: one is the length of the path l, the other is the time that the vertices on the path l blocked. The key of our work is to improve the accuracy of blocked time calculating. These bounds overestimate the number of vertices in the DAG task which can block the vertices with the same type on the path l that

Response time analysis for typed DAG task

We present a new response time analysis method to support heterogeneous and parallel computation based on the RTA presented in (8). The new bound CPB can reduce the workload that does not cause any blocked time on the parallel workload executed in the same type of cores as much as possible. It allows a reduction of the self-interference factor, being the new response time upper bound more accurate than (5).

In Section 5, all the unit-nodes have been assigned to criticality sets and each

Evaluation

In this section, we experimentally evaluate the performance of our proposed response time analysis method CPB which based on the criticality allocation strategy with the known WCRT bounds in terms of both precision and efficiency. Since the platform model studied in [36] is different from the other known WCRT bounds, so we divide the experiments into two parts.

Conclusion

This paper presents a new scheduling algorithm and a new WCRT bound for typed DAG parallel task supporting heterogeneous computing, where the workload of each vertex in the typed DAG is only allowed to execute on a particular type of cores. The known WCRT bounds scheduled by the work-conserving scheduling algorithm are pessimistic because of considering more unnecessary blocked time comes from the vertices with the same type and on the parallel paths but have been completed or not started yet.

Declaration of Competing Interest

Title: Real-Time Scheduling and Analysis of Parallel Tasks on Heterogeneous Multi-cores. Author: Shuangshuang Chang, Xufeng Zhao, Qingxu Deng, declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript

Acknowledgement

This work was supported by the National Key R&D Program of China under Grant No. 2018YFB1702000, the Joint Funds of the National Natural Science Foundation of China under Grant No. U1908212, the National Natural Science Foundation of China under Grant No. 61972076, the National Natural Science Foundation of China under Grant No. 61871107 and the National Natural Science Foundation of China under Grant No. 61602104.

Shuangshuang Chang received the M.S. degree in computer technology from Northeastern University, Shenyang, China, in 2016, where she is currently pursuing the Ph.D. degree. Her current research interests include embedded real-time system, scheduling analysis in mixed criticality system, and security mechanism of cyber-physical systems.

References (55)

  • D. Nadeau et al.

    Efficient large-scale heterogeneous debugging using dynamic tracing

    J. Syst. Archit.

    (2019)
  • T. Li et al.

    Minimizing temperature and energy of real-time applications with precedence constraints on heterogeneous MPSoC systems

    J. Syst. Archit.

    (2019)
  • M. Brandalero et al.

    Predicting performance in multi-core systems with shared reconfigurable accelerators

    J. Syst. Archit.

    (2019)
  • S. Li

    Parallel batch scheduling with inclusive processing set restrictions and non-identical capacities to minimize makespan

    Eur. J. Oper. Res.

    (2017)
  • S. Karhi et al.

    On the optimality of the tls algorithm for solving the online-list scheduling problem with two job types on a set of multipurpose machines

    J. Comb. Optim.

    (2013)
  • L. Epstein et al.

    Scheduling with processing set restrictions: ptas results for several variants

    Int. J. Prod. Econ.

    (2011)
  • J.Y.-T. Leung et al.

    Scheduling with processing set restrictions: asurvey

    Int. J. Prod. Econ.

    (2008)
  • J.Y.-T. Leung et al.

    Scheduling with processing set restrictions: aliterature update

    Int. J. Prod. Econ.

    (2016)
  • E. Bini et al.

    Measuring the performance of schedulability tests

    Real-Time Syst.

    (2005)
  • T. Langer et al.

    A survey of parallel hard-real time scheduling on task models and scheduling approaches

    ARCS 2017; 30th International Conference on Architecture of Computing Systems

    (2017)
  • P. Corbett et al.

    Overview of the mpi-io parallel i/o interface

    IPPS95 Workshop on Input/Output in Parallel and Distributed Systems

    (1995)
  • O.A.R. Board, Openmp application program interface, version 5.0, 2018....
  • A. Saifullah et al.

    Multi-core real-time scheduling for generalized parallel task models

    Real-Time Syst.

    (2013)
  • J.C. Fonseca et al.

    A multi-DAG model for real-time parallel applications with conditional execution

    Proceedings of the 30th Annual ACM Symposium on Applied Computing

    (2015)
  • X. Jiang et al.

    Semi-federated scheduling of parallel real-time tasks on multiprocessors

    2017 IEEE Real-Time Systems Symposium (RTSS)

    (2017)
  • R. Pathan et al.

    Scheduling parallel real-time recurrent tasks on multicore platforms

    IEEE Trans. Parallel Distrib. Syst.

    (2017)
  • S. Baruah

    The federated scheduling of systems of conditional sporadic DAG tasks

    Proceedings of the 12th International Conference on Embedded Software

    (2015)
  • Cited by (27)

    • A systematic review on security aware real-time task scheduling

      2023, Sustainable Computing: Informatics and Systems
    • Response time analysis of parallel tasks on accelerator-based heterogeneous platforms

      2022, Journal of Systems Architecture
      Citation Excerpt :

      He et al. [34] proposed the response time analysis method for DAG tasks with arbitrary intra-task priority assignment. Real-time scheduling and analysis of DAG tasks on heterogeneous platforms have attracted attention due to the use of the advantages of different architectures which provide higher performance. [20,21,23,24] studied the WCRT bound of DAG tasks that have multiple types of vertices running on the specified type of processors.

    • Computing exact WCRT for typed DAG tasks on heterogeneous multi-core processors

      2022, Journal of Systems Architecture
      Citation Excerpt :

      Yang et al. [29] based on the non-preemptive G-EDF (global earliest deadline first) scheduling strategy, studied scheduling problems of multiple typed DAG tasks by decomposing each DAG task into a group of independent subtasks with release time and deadlines, and analyzed the decomposed independent subtasks by known methods. Chang et al. [21] analyzed typed DAG tasks off-line, assigned a dynamic criticality to each vertex, scheduled vertices according to the criticality of each vertex, and then proposed a new response time analysis method to obtain a tighter WCRT upper bound. Zahaf et al. [30] proposed a novel HPC-DAG (Heterogeneous Parallel Condition Directed Acyclic Graph Model) for heterogeneous platforms.

    • SEAMERS: A Semi-partitioned Energy-Aware scheduler for heterogeneous MulticorEReal-time Systems

      2021, Journal of Systems Architecture
      Citation Excerpt :

      To meet such requirements, the industry is advancing towards specialized processing cores, like multi-CPU platforms with graphics processing cores, signal processing cores, etc. With the advent of heterogeneous platforms such as ARM’s big.LITTLE, Nvidia Tegra, Samsung Exynos, etc., there is a need for embedded systems design strategies to adapt to these newer platforms [2,3]. Given a group of real-time applications and a heterogeneous multicore processing platform, successfully guaranteeing timing, energy, and performance constraints is a scheduling problem.

    View all citing articles on Scopus

    Shuangshuang Chang received the M.S. degree in computer technology from Northeastern University, Shenyang, China, in 2016, where she is currently pursuing the Ph.D. degree. Her current research interests include embedded real-time system, scheduling analysis in mixed criticality system, and security mechanism of cyber-physical systems.

    Xufeng Zhao was born in Panjin, Liaoning of China in 1996. He received his bachelor’s degree in internet of things engineering from Northeastern University, China in 2018, he is a master candidate at the computer system architecture in Northeastern University, China. His research interests are broadly in embedded real-time systems, especially the real-time scheduling on multi-cores systems.

    Zhenyu Liu was born in Nanyang, Henan of China in 1995. He received his bachelor’s degree in internet of things engineering from Northeastern University, China in 2018, he is a master candidate at the computer applications technology in Northeastern University, China. His research include real-time embedded systems and cyber-physical systems.

    Qingxu Deng received his Ph.D. degree in computer science from Northeastern University, China, in 1997. He is a professor of the School of Computer Science and Engineering, Northeastern University, China, where he serves as the Director of institute of Cyber-Physical Systems. His main research interests include Cyber-Physical systems, embedded systems, and realtime systems.

    View full text