Real-Time scheduling and analysis of parallel tasks on heterogeneous multi-cores

doi:10.1016/j.sysarc.2019.101704

Journal of Systems Architecture

Volume 105, May 2020, 101704

https://doi.org/10.1016/j.sysarc.2019.101704 Get rights and content

Abstract

Heterogeneous multi-cores and parallel architectures have recently gained much attention owing to utilizing the strength of different architectures for offering higher performance. In this paper, we study the real-time scheduling of the directed acyclic graph (DAG) tasks upon the heterogeneous multi-core platform, i.e., a task contains different types of vertices, and the workload of each vertex must execute on its particular type of cores. Traditional researches use the work-conserving scheduling strategy to schedule such a typed DAG task and lead to pessimistic schedulability tests. To this end, we propose a novel scheduling algorithm for typed DAG tasks, which assigns each vertex a varying criticality that depends on the remaining workload of the vertex, and moreover, the vertex with higher criticality is more urgent to be executed. Under this scheduling strategy, we propose a new worst-case response time (WCRT) bound to verify the schedulability of DAG task supporting heterogeneous computing. Experiments with randomly generated workload show that the accuracy of our new WCRT is about 20% higher on average than the existing bounds.

Introduction

To meet the increasing performance requirements, parallel hardware architectures have become the mainstream in the multi-cores embedded field. Parallel programming models are fundamental to exploit the performance capabilities of these architectures [1], [2], [3]. In recent years, parallel task scheduling problems with real-time constraints have made great progress [4]. Some researchers developed multiple parallel programming paradigms, such as MPI [5], OpenMP [6], [7] or parallel programming languages as CilkPlus [8] to aid developers in the creation of parallel programs. All these parallel programming paradigms currently support intra-task parallelism, where a single task consists of multiple parallel code parts that can be executed simultaneously. DAG task model is a promising model to formulate the intra-task parallelism software. The real-time scheduling and analysis of DAG parallel task model has gained a lot of attention in the real-time and High-Performance Computing communities [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].

Moreover, heterogeneous hardware architecture that can utilize specialized processing capabilities and can offer higher performance and energy efficiency than homogeneous architecture has received more and more attention [19], [20], [21], [22], [23]. In general, heterogeneous hardware architecture consists of equipment that is asymmetric in performance and functionality [24], [25] which integrate low power general purpose multi-cores (known as the host) with specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPUs), such as NVIDIA Tegra X1 [26] or Xilinx UltraScale [27]. Heterogeneous multiprocessor systems on a chip (MPSoCs) as one of heterogeneous hardware architecture has been widely used in many real-time embedded systems. As introduced in [28], [29], [30] MPSoCs can be broadly classified into performance heterogeneity and functional heterogeneity. Performance heterogeneity is that cores with the same functionality (i.e., same instruction set architecture (ISA)) but different power-performance characteristics are integrated. Functional heterogeneity is that cores with very different functionality (i.e., different ISA) are interspersed on the same die. Jetson TX2 [31] belongs to performance heterogeneity since it adopts the big. LITTLE architecture [32] that integrates high-performance cores (big cores) with low-power cores (LITTLE cores). It contains two Denver cores (high-performance cores) and four ARM Cortex-A57 cores (low-power cores). Denver cores and ARM cores have different power-performance characteristics but they are coherent and share the same ISA.

Current parallel programming languages tend to support heterogeneous multi-cores. For example, in OpenMP [6], the proc_bind clause can be used to specify a mapping of threads to some processing core. In CUDA [33], the cudaSetDevice function can be used to set the following execution to the target device. In OpenCL [34], the clCreateCommandQueue function can be used to create command queues for some devices.

In this paper, we consider real-time scheduling of typed DAG tasks on heterogeneous multi-cores, where each vertex is explicitly bound to a specific type of cores for execution. Binding code snippets of a program to a specific type of cores is a common operation in heterogeneous multi-cores scheduling and can be easily implemented in mainstream parallel programming frameworks and operating systems.

The real-time scheduling of typed DAG tasks under heterogeneous platforms is studied in [35], [36], [37]. All these work schedules the DAG tasks under the work-conserving algorithm, the response time analysis methods introduced as follows. Jeffrey et al [35] proposed the first WCRT bound for the general typed DAG task model. However, Jeffrey’s response time bound is very pessimistic. Serrano et al [36] proposed the response time bound for a specific typed DAG task model with two typed cores that has certain limitations. Han et al [37] developed two response time bounds in which the first bound dominated Jeffrey’s bound [35] in analysis precision and another bound significantly improved the analysis precision by exploring more detailed task graph structure information. Even Han’s response time bounds are still very pessimistic. When analyzing the worst response time of each path, it took into account some blocked time from vertices of the same type and in different paths that have completed or not yet executed, which is unnecessary. Yang et al. [38] studied the scheduling of typed DAG tasks by decomposing each DAG task into a set of independent subtasks with artificial release time and deadlines.

This paper aims to get a more accurate WCRT upper bound for typed DAG tasks. To solve the problems in the early work, we propose a criticality allocation strategy, which assigns a criticality to each vertex. The criticality determines the urgency of vertex execution and decreases as the remaining workload of the vertex decreases. Based on the strategy, we propose a new WCRT bound to verify the schedulability of the DAG task supporting heterogeneous computing. It can reduce the number of potentially parallel vertices to a relatively small range. Experiments with randomly generated workload show that our proposed criticality allocation strategy and new bound are significantly more precise than the existing bound.

The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 introduces the system model. Section 4 enumerates the known WCRT upper bounds of the same model with us and analyzes the problems in these bounds. Section 5 presents a new scheduling strategy for typed DAG tasks based on criticality allocation. Section 6 describes the response time analysis for our new scheduling strategy. Section 7 presents the experimental results comparing our approach with existing WCRT bounds. Finally, Section 8 concludes the paper highlighting some future research directions.

Section snippets

Related work

Parallel and heterogeneous hardware architectures become mainstream in the embedded real-time domain to cope with the increasing performance requirements. Scheduling of an application modeled by DAG which as fundamental in parallel programming models is a key problem when aiming at high performance. The classical response time bound for untyped DAG tasks was proposed by Graham [39] and Graham [40]. Based on [39], [40], the response time analysis for multiple untyped DAG (in which the vertices

Platform model

The heterogeneous multi-core platform consisting of S types of cores is formulated as a collection of cores $C = {C_{1}, \dots, C_{S}},$ where C_s (s ∈ [1, S]) is the set of the s-th type of cores. For the sake of convenience, we let m_s be the cardinality of C_s, i.e., $m_{s} = | C_{s} |$ ¹

Task model

The parallel task is formulated as a typed DAG model $G = (V, E, Γ, c),$ where V is the set of vertices, E

Existing WCRT bound

We briefly review some prior results from the response time analysis literature, that we will compare some of them with the response time bound which derived in this paper. On the known work for the considered model, the typed DAG task G is scheduled on the heterogeneous multi-core platform by a work-conserving scheduling algorithm, under which an eligible vertex of type s, that all of its predecessors have finished, must be executed if there are available cores of type s. It also applies to

A new scheduling algorithm for the typed DAG task

In the previous section, we analyzed the existing bounds and the problems within them. From the bounds we can know that the response time calculation of each path l is divided into two parts: one is the length of the path l, the other is the time that the vertices on the path l blocked. The key of our work is to improve the accuracy of blocked time calculating. These bounds overestimate the number of vertices in the DAG task which can block the vertices with the same type on the path l that

Response time analysis for typed DAG task

We present a new response time analysis method to support heterogeneous and parallel computation based on the RTA presented in (8). The new bound CPB can reduce the workload that does not cause any blocked time on the parallel workload executed in the same type of cores as much as possible. It allows a reduction of the self-interference factor, being the new response time upper bound more accurate than (5).

In Section 5, all the unit-nodes have been assigned to criticality sets and each

Evaluation

In this section, we experimentally evaluate the performance of our proposed response time analysis method CPB which based on the criticality allocation strategy with the known WCRT bounds in terms of both precision and efficiency. Since the platform model studied in [36] is different from the other known WCRT bounds, so we divide the experiments into two parts.

Conclusion

This paper presents a new scheduling algorithm and a new WCRT bound for typed DAG parallel task supporting heterogeneous computing, where the workload of each vertex in the typed DAG is only allowed to execute on a particular type of cores. The known WCRT bounds scheduled by the work-conserving scheduling algorithm are pessimistic because of considering more unnecessary blocked time comes from the vertices with the same type and on the parallel paths but have been completed or not started yet.

Declaration of Competing Interest

Title: Real-Time Scheduling and Analysis of Parallel Tasks on Heterogeneous Multi-cores. Author: Shuangshuang Chang, Xufeng Zhao, Qingxu Deng, declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and company that could be construed as influencing the position presented in, or the review of, the manuscript

Acknowledgement

This work was supported by the National Key R&D Program of China under Grant No. 2018YFB1702000, the Joint Funds of the National Natural Science Foundation of China under Grant No. U1908212, the National Natural Science Foundation of China under Grant No. 61972076, the National Natural Science Foundation of China under Grant No. 61871107 and the National Natural Science Foundation of China under Grant No. 61602104.

Shuangshuang Chang received the M.S. degree in computer technology from Northeastern University, Shenyang, China, in 2016, where she is currently pursuing the Ph.D. degree. Her current research interests include embedded real-time system, scheduling analysis in mixed criticality system, and security mechanism of cyber-physical systems.

References (55)

T. Yang et al.
Building real-time parallel task systems on multi-cores: a hierarchical scheduling approach
J. Syst. Archit.
(2019)
M. Han et al.
Bounding carry-in interference for synchronous parallel tasks under global fixed-priority scheduling
J. Syst. Archit.
(2018)
P. Mejia-Alvarez et al.
Evaluation framework for energy-aware multiprocessor scheduling in real-time systems
J. Syst. Archit.
(2019)
H. Du et al.
Scope-aware data cache analysis for OpenMP programs on multi-core processors
J. Syst. Archit.
(2019)
J. Li et al.
Analysis of federated and global scheduling for parallel real-time tasks
2014 26th Euromicro Conference on Real-Time Systems
(2014)
J. Lee et al.
Thread-level priority assignment in global multiprocessor scheduling for DAG tasks
J. Syst. Softw.
(2016)
Y. Kim et al.
Data dependency reduction for high-performance fpga implementation of deflate compression algorithm
J. Syst. Archit.
(2019)
A. Rodríguez et al.
Exploring heterogeneous scheduling for edge computing with cpu and fpga mpsocs
J. Syst. Archit.
(2019)
T. Xia et al.
Ker-one: a new hypervisor managing fpga reconfigurable accelerators
J. Syst. Archit.
(2019)
A.G. Blaiech et al.
A survey and taxonomy of fpga-based deep learning accelerators
J. Syst. Archit.
(2019)

D. Nadeau et al.

Efficient large-scale heterogeneous debugging using dynamic tracing

J. Syst. Archit.

(2019)

T. Li et al.

Minimizing temperature and energy of real-time applications with precedence constraints on heterogeneous MPSoC systems

J. Syst. Archit.

(2019)

M. Brandalero et al.

Predicting performance in multi-core systems with shared reconfigurable accelerators

J. Syst. Archit.

(2019)

S. Li

Parallel batch scheduling with inclusive processing set restrictions and non-identical capacities to minimize makespan

Eur. J. Oper. Res.

(2017)

S. Karhi et al.

On the optimality of the tls algorithm for solving the online-list scheduling problem with two job types on a set of multipurpose machines

J. Comb. Optim.

(2013)

L. Epstein et al.

Scheduling with processing set restrictions: ptas results for several variants

Int. J. Prod. Econ.

(2011)

J.Y.-T. Leung et al.

Scheduling with processing set restrictions: asurvey

Int. J. Prod. Econ.

(2008)

J.Y.-T. Leung et al.

Scheduling with processing set restrictions: aliterature update

Int. J. Prod. Econ.

(2016)

E. Bini et al.

Measuring the performance of schedulability tests

Real-Time Syst.

(2005)

T. Langer et al.

A survey of parallel hard-real time scheduling on task models and scheduling approaches

ARCS 2017; 30th International Conference on Architecture of Computing Systems

(2017)

P. Corbett et al.

Overview of the mpi-io parallel i/o interface

IPPS95 Workshop on Input/Output in Parallel and Distributed Systems

(1995)

O.A.R. Board, Openmp application program interface, version 5.0, 2018....

A. Saifullah et al.

Multi-core real-time scheduling for generalized parallel task models

Real-Time Syst.

(2013)

J.C. Fonseca et al.

A multi-DAG model for real-time parallel applications with conditional execution

Proceedings of the 30th Annual ACM Symposium on Applied Computing

(2015)

X. Jiang et al.

Semi-federated scheduling of parallel real-time tasks on multiprocessors

2017 IEEE Real-Time Systems Symposium (RTSS)

(2017)

R. Pathan et al.

Scheduling parallel real-time recurrent tasks on multicore platforms

IEEE Trans. Parallel Distrib. Syst.

(2017)

S. Baruah

The federated scheduling of systems of conditional sporadic DAG tasks

Proceedings of the 12th International Conference on Embedded Software

(2015)

Cited by (27)

VPSS: A DAG scheduling heuristic with improved response time bound
2024, Journal of Systems Architecture
Real-time and embedded systems are shifting from single-core to multi-core platforms, on which software must be parallelized to fully utilize the computation power of multi-core hardware. Most current real-time parallel tasks can be modeled as directed acyclic graphs (DAG). Scheduling DAG tasks on multi-core processors is a key issue for high-performance computing, and in real-time scenario, a good scheduler should not only achieves a competitive performance in practice, but also needs to be predictable from a theoretical point of view. Graham’s list scheduling algorithm performs well, but often suffers from pessimism, i.e., the gap between the exact response time and Graham’s response time bound is larger than 50% or more for some cases. In this paper, we propose a novel heuristic for scheduling a DAG task, called the vertex-partition-and-subset-scheduling (VPSS) algorithm, which has two phases: first, we partition DAG’s vertices into several disjoint subsets, and then we sequentially schedule these vertex subsets. The VPSS algorithm yields a response time bound that totally dominates Graham’s bound. Experimental work shows that our response time bound is 8% smaller than Graham’s bound on average, and our bound is even 80% smaller than Graham’s bound for some special cases.
A systematic review on security aware real-time task scheduling
2023, Sustainable Computing: Informatics and Systems
Nowadays, security and scheduling are the most crucial aspects of each distributed real-time application due to the rapidly enhanced use of such applications. This study presents a systematic literature review (SLR) on security and scheduling techniques for real-time applications in homogeneous and heterogeneous environments and to identify areas that need more attention for a wide range of applicability of real-time applications. The search procedure identified 3162 articles, and 103 best relevant articles are selected based on quality assessment scores for a brief review and discussions with the area experts or academicians. This study considers articles with a quality assessment score of more than five for a brief review. Hence, the applications related to security, scheduling, load balancing, fixed priority, task, job, and packet scheduling, and deadline constraints of soft and hard real-time systems in homogeneous and heterogeneous environments as per the state-of-the-art have been discussed. This discussion provides a better understanding of the real-time applications. The survey thus, aims to present the improvements in real-time task scheduling with security constraints in different distributed computing environments. The SLR also highlights various real-time task scheduling algorithms/techniques including the role of security. This SLR also provides meaningful and valuable study to the researchers by providing brief studies of various real-time task scheduling algorithms and security needs in tabular form.
An optimal semi-partitioned algorithm for scheduling real-time applications on uniform multicore processors
2023, Sustainable Computing: Informatics and Systems
Uniform multicore processor is an acknowledged architecture for building complex real-time embedded systems. Having multiple cores with various execution capabilities makes it a favorable architecture to structure high-performing energy efficient systems. This paper addresses the real-time scheduling problem of uniform multicore processors and proposed a novel fixed-priority semi-partitioned scheduling algorithm: RM-SPwTS. We have shown that RM-SPwTS dominates the current semi-partitioned counterparts. It is additionally indicated that RM-SPwTS is an optimal algorithm as it achieves 69 % utilization bound. To the best of our knowledge, RM-SPwTS is the first algorithm in fixed-priority multicore scheduling category that achieves this bound. The superior performance of RM-SPwTS over its existing counterparts is also established through extensive simulations. The simulation results reveal that RM-SPwTS schedules up-to 95 % more task-sets with heavy system utilization (72 %−73 %) as compared to its counterparts. Further, it improves the processor utilization (up-to 14 %) and reduces the number of cores (up-to 24 %) required to feasibly schedule the given workload.
Response time analysis of parallel tasks on accelerator-based heterogeneous platforms
2022, Journal of Systems Architecture
Citation Excerpt :
He et al. [34] proposed the response time analysis method for DAG tasks with arbitrary intra-task priority assignment. Real-time scheduling and analysis of DAG tasks on heterogeneous platforms have attracted attention due to the use of the advantages of different architectures which provide higher performance. [20,21,23,24] studied the WCRT bound of DAG tasks that have multiple types of vertices running on the specified type of processors.
Due to the inherent high parallelism and heterogeneity, accelerator-based heterogeneous platforms have been regarded as promising solutions for computation-intensive applications. With respect to latency and power consumption, accelerator often performs better than general-purpose processors. The directed acyclic graph (DAG) model is widely used to represent parallel applications, where each vertex represents an independent workload and each edge represents the precedence constraint between two vertices. The response time analysis of DAG tasks is of paramount importance to the judgment of task schedulability. However, the existing response time analysis method of DAG tasks on accelerator-based heterogeneous platforms just offloading one vertex to the accelerator cannot comply with the actual application scenarios. The well-known worst-case response time (WCRT) bounds of DAG tasks on heterogeneous platforms are pessimistic for judging the task schedulability. This paper studies the response time analysis method of DAG tasks that offload multiple vertices to the accelerator. We transform the graph structure of the DAG task to reduce the interference of vertices running on the accelerator to vertices on the general-purpose processor. Finally, we propose a new WCRT bound for the transformed DAG. The experimental results show that the new WCRT bound is more precise than the existing bounds of DAG tasks.
Computing exact WCRT for typed DAG tasks on heterogeneous multi-core processors
2022, Journal of Systems Architecture
Citation Excerpt :
Yang et al. [29] based on the non-preemptive G-EDF (global earliest deadline first) scheduling strategy, studied scheduling problems of multiple typed DAG tasks by decomposing each DAG task into a group of independent subtasks with release time and deadlines, and analyzed the decomposed independent subtasks by known methods. Chang et al. [21] analyzed typed DAG tasks off-line, assigned a dynamic criticality to each vertex, scheduled vertices according to the criticality of each vertex, and then proposed a new response time analysis method to obtain a tighter WCRT upper bound. Zahaf et al. [30] proposed a novel HPC-DAG (Heterogeneous Parallel Condition Directed Acyclic Graph Model) for heterogeneous platforms.
Heterogeneous multi-core architectures achieve high performance and energy efficiency in real-time domain towards various applications. Most real-time parallel applications on heterogeneous multi-cores can be modeled as a typed directed acyclic graph (DAG) task, where the workload of each vertex is only allowed to execute on a particular type of core. Traditional worst-case response time (WCRT) analysis methods for these applications modeled as DAG are very pessimistic and impractical, i.e., there is a big gap between existing WCRT bounds and the exact WCRT. In this paper, we propose a satisfiability modulo theories-based method to exactly analyze the WCRT of the typed DAG task scheduled upon heterogeneous multi-cores. Experimental results show that our method can significantly improve the precision of the WCRT, and thus, dramatically increase the acceptance rate in the schedulability analysis.
SEAMERS: A Semi-partitioned Energy-Aware scheduler for heterogeneous MulticorEReal-time Systems
2021, Journal of Systems Architecture
Citation Excerpt :
To meet such requirements, the industry is advancing towards specialized processing cores, like multi-CPU platforms with graphics processing cores, signal processing cores, etc. With the advent of heterogeneous platforms such as ARM’s big.LITTLE, Nvidia Tegra, Samsung Exynos, etc., there is a need for embedded systems design strategies to adapt to these newer platforms [2,3]. Given a group of real-time applications and a heterogeneous multicore processing platform, successfully guaranteeing timing, energy, and performance constraints is a scheduling problem.
Over the years, the nature of processing platforms is witnessing a significant shift in most of the battery supported real-time systems, which now support a combination of specialized multicores to meet the demands of modern applications. Devising energy-efficient schedulers has become a critical issue for such kinds of devices. Hence, this research presents a low-overhead heuristic strategy named SEAMERS, for DVFS based energy-aware scheduling for a set of real-time periodic tasks on a heterogeneous multicore platform. The presented strategy operates in four phases, namely Deadline Partitioning, Core Clustering, Task Allocation and Energy-Aware Scheduling. Our experimental analysis shows that the presented strategy improves upon the state-of-the-art in terms of energy savings (16% to 47% on average) and enables significant improvement in resource utilization.

View all citing articles on Scopus

Xufeng Zhao was born in Panjin, Liaoning of China in 1996. He received his bachelor’s degree in internet of things engineering from Northeastern University, China in 2018, he is a master candidate at the computer system architecture in Northeastern University, China. His research interests are broadly in embedded real-time systems, especially the real-time scheduling on multi-cores systems.

Zhenyu Liu was born in Nanyang, Henan of China in 1995. He received his bachelor’s degree in internet of things engineering from Northeastern University, China in 2018, he is a master candidate at the computer applications technology in Northeastern University, China. His research include real-time embedded systems and cyber-physical systems.

Qingxu Deng received his Ph.D. degree in computer science from Northeastern University, China, in 1997. He is a professor of the School of Computer Science and Engineering, Northeastern University, China, where he serves as the Director of institute of Cyber-Physical Systems. His main research interests include Cyber-Physical systems, embedded systems, and realtime systems.

View full text

Real-Time scheduling and analysis of parallel tasks on heterogeneous multi-cores

Abstract

Introduction

Section snippets

Related work

Platform model

Task model

Existing WCRT bound

A new scheduling algorithm for the typed DAG task

Response time analysis for typed DAG task

Evaluation

Conclusion

Declaration of Competing Interest

Acknowledgement

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Softw.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

Eur. J. Oper. Res.

J. Comb. Optim.

Int. J. Prod. Econ.

Int. J. Prod. Econ.

Int. J. Prod. Econ.

Real-Time Syst.

A survey of parallel hard-real time scheduling on task models and scheduling approaches

ARCS 2017; 30th International Conference on Architecture of Computing Systems

Overview of the mpi-io parallel i/o interface

IPPS95 Workshop on Input/Output in Parallel and Distributed Systems

Multi-core real-time scheduling for generalized parallel task models

Real-Time Syst.

A multi-DAG model for real-time parallel applications with conditional execution

Proceedings of the 30th Annual ACM Symposium on Applied Computing

Semi-federated scheduling of parallel real-time tasks on multiprocessors

2017 IEEE Real-Time Systems Symposium (RTSS)

Scheduling parallel real-time recurrent tasks on multicore platforms

IEEE Trans. Parallel Distrib. Syst.

The federated scheduling of systems of conditional sporadic DAG tasks

Proceedings of the 12th International Conference on Embedded Software