Elsevier

Future Generation Computer Systems

Volume 100, November 2019, Pages 165-175
Future Generation Computer Systems

Lifetime-aware real-time task scheduling on fault-tolerant mixed-criticality embedded systems

https://doi.org/10.1016/j.future.2019.05.022Get rights and content

Highlights

  • Conduct the first study on maximizing the lifetime of fault-tolerant mixed-criticality embedded systems.

  • Consider both transient faults and thermal cycling incurred permanent faults.

  • Present a mixed-integer linear programming (MILP) formulation.

  • Propose a novel time-efficient heuristic.

  • Carried out experiments on both real-world and synthetic benchmarks.

Abstract

In recent years, the design of mixed-criticality embedded systems suffering from transient faults has attracted much attention. From the perspective of system users, it is desirable to optimize system lifetime while meeting all design constraints. Existing task scheduling algorithms cannot be utilized to maximize the lifetime of mixed-criticality embedded systems since they do not take into account the impact of providing transient fault tolerance on system lifetime. This paper investigates the problem of prolonging the lifetime of mixed-criticality embedded systems on a uniprocessor equipped with dynamic voltage and frequency scaling (DVFS) technique. The transient faults and thermal cycling incurred permanent faults are simultaneously considered in the system lifetime optimization under the constraints of safety requirements and schedule timeliness. A mixed-integer linear programming (MILP) formulation is first presented to deal with the task scheduling problem. Since the MILP method is a time-consuming solution for large-scale systems, a cross-entropy method based heuristic is then proposed to achieve a better tradeoff between the system lifetime achieved by the derived task schedule and the runtime consumed to generate the task schedule. Experiments based on synthetic and real-world benchmarks are conducted, and simulation results demonstrate that the proposed heuristic improves system lifetime by up to 32.73% with acceptable runtime as compared to benchmarking methods.

Introduction

In many safety-related fields such as avionics and automotive industries, tasks of different importance (i.e., criticality levels) usually co-exist [1], [2]. For example, the two tasks of music playing and flight control in the flight management systems are obviously with unequal criticality levels and distinct degrees of assurance. However, traditional embedded systems only allow the existence of tasks with a same criticality level, which indicates that the tasks with different criticality levels need to run on multiple separate hardware platforms. As a result, the cost, weight, and power consumption of these hardware platforms will significantly increase or even become unaffordable as the total number of task criticality levels increases. A promising trend in the design of nowadays advanced embedded systems is to integrate multiple functionalities (i.e., tasks) with no less than two criticality levels onto a common and shared computing platform. These emerging embedded systems that permit the co-existence of different task criticality levels are called mixed-criticality (MC) embedded systems [3], [4]. In an MC embedded system, tasks having higher importance should be provided higher criticality levels for the purpose of guaranteeing the safety requirements of these tasks.

In the past few years, there has been a widespread interest in the investigation of MC embedded systems. However, as detailed in [4], most of the existing techniques (e.g., mechanisms in [5], [6], [7], [8], [9]) dedicated for MC embedded systems cannot guarantee a dependable system operation because they fail to take fault-tolerance as a design constraint. As the susceptibility of modern processors to soft errors is dramatically increasing with the relentless scaling of feature size and operating voltage [10], [11], fault-tolerance management is deemed to be a significant and pressing research issue in MC embedded systems. Recently, several research works [12], [13], [14], [15], [16] have devoted to the design of fault-tolerant MC embedded systems. Unfortunately, these existing task scheduling algorithms only consider tolerating transient faults while neglecting tolerance for permanent faults. Directly adopting such algorithms will inevitably accelerate processor wearouts that eventually result in permanent faults occurring earlier and system lifetime decaying drastically. From the perspective of system users, it is desirable to prolong the lifetime of MC embedded systems able to handle both permanent faults and transient faults. However, to the best of our knowledge, there are no works on investigating the lifetime optimization of MC embedded systems that can deal with permanent and transient faults simultaneously.

In this paper, we conduct the first study of prolonging the lifetime of MC embedded systems on a uniprocessor equipped with dynamic voltage and frequency scaling (DVFS) technique. We take into account both transient faults and thermal cycling incurred permanent faults in the system lifetime optimization under the constraints of safety requirements and schedule timeliness. By judiciously determining (1) the task operating frequency, (2) the task re-execution number, and (3) the task execution order, our proposed solution can generate a lifetime-optimum task schedule while satisfying all design requirements. The main contributions of this paper are summarized below.

  • We present a mixed-integer linear programming (MILP) formulation to schedule independent real-time tasks for maximizing the lifetime of uniprocessor MC embedded systems.

  • We propose a time-efficient solution developed on thecross-entropy method (CEM) to the formulated scheduling problem.

  • Experimental results based on synthetic and real-life benchmarks demonstrate that the developed approach prolongs system lifetime by up to 32.73%.

The remainder of the paper is organized as follows. Section 2 surveys the related works on MC embedded systems. We briefly introduce the system architecture and models in Section 3. In Section 4, we describe the problem definition and give an overview of our developed algorithms. Section 5 presents an MILP formulation for the studied task scheduling problem while Section 6 shows the developed CEM-based heuristic. Section 7 numerically investigates the performance achieved by the proposed scheme and Section 8 gives concluding remarks.

Section snippets

Related work

Considerable research efforts have been denoted to the design of MC embedded systems. Baruah et al. [5] presented an algorithm of earliest deadline first with virtual deadlines to schedule tasks with any number of defined criticality levels. Huang et al. [6] proposed an effective scheduling algorithm integrating the DVFS technique for optimizing the whole power dissipation of MC embedded systems. Han et al. [7] derived a criticality-aware utilization bound for feasibility tests and developed a

System architecture and models

This section briefly introduces the system architecture and models including task model, temperature model, fault model, and safety requirement model. Table 1 summarizes the main symbols utilized in this section.

Problem definition and solution overview

In this section, we first give a definition of the studied problem and then outline the proposed solution.

MILP formulation

This section presents an MILP-based method for dealing with the problem of lifetime maximization under the safety requirements and the task schedulability constraint for fault-tolerant MC embedded systems. Table 3 summarizes the main symbols used in this section. For ease of presentation, the following variables are defined. Λi,q=1if task τi is executed at frequency fq0otherwiseϖi:the start execution time of task τiεi,q:the total execution number of τi running at fq

CEM-Based heuristic

The MILP-based method generates a lifetime-optimum task schedule with consideration of all design constraints. However, this method cannot efficiently tackle the studied scheduling problem as the size of the problem increases. Given this, we develop a time-efficient CEM-based heuristic to overcome the shortcoming of MILP-based method in terms of runtime. This section first outlines the CEM’s theoretic foundation, and then describes in detail the developed CEM-based heuristic. Table 4 summarizes

Simulation setup

Two sets of simulations are carried out: one is conducted on synthetic benchmarks, and the other is conducted on real-world benchmarks. In the first set of simulation experiments, a random task generator implemented in C++ is used to produce multiple synthetic benchmarks (i.e., task sets). For each task in a synthetic benchmark, its worst-case execution cycles incorporating the CPU cycles of an acceptance test are randomly selected from the interval of [4×107,6×108], which are generated on the

Conclusion

In this paper, we tackle the lifetime optimization problem for MC embedded systems via designing effective task scheduling schemes. We consider both transient faults and thermal cycling incurred permanent faults. In addition to an MILP-based algorithm, we also present a time-efficient CEM-based heuristic to achieve system lifetime maximization. Task re-execution technique is utilized in the two algorithms to tolerate transient faults. Two sets of simulation experiments based on synthetic task

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work.

Kun Cao is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, East China Normal University, Shanghai, China. His current research interests are in the areas of Internet of things, cyber physical systems, 3-D ICs, and real-time embedded systems. He was a recipient of the Reviewer Award from Journal of Circuits, Systems, and Computers, in 2016.

References (48)

  • LiZ. et al.

    Reliability guaranteed energy minimization on mixed-criticality systems

    J. Syst. Softw.

    (2016)
  • ZhouJ. et al.

    Reliability and temperature constrained task scheduling for makespan minimization on heterogeneous multi-core platforms

    J. Syst. Softw.

    (2017)
  • BurnsA. et al.

    Robust mixed-criticality systems

    IEEE Trans. Comput.

    (2018)
  • CaplanJ. et al.

    Mapping and scheduling mixed-criticality systems with on-demand redundancy

    IEEE Trans. Comput.

    (2018)
  • G. Giannopoulou, N. Stoimenov, P. Huang, L. Thiele, Scheduling of mixed-criticality applications on resource-sharing...
  • BurnsA. et al.

    A survey of research into mixed criticality systems

    ACM Comput. Surv.

    (2017)
  • BaruahS. et al.

    Preemptive uniprocessor scheduling of mixed-criticality sporadic task systems

    J. ACM

    (2015)
  • P. Huang, P. Kumar, G. Giannopoulou, L. Thiele, Energy efficient DVFS scheduling for mixed-criticality systems, in:...
  • HanJ. et al.

    Multicore mixed-criticality systems: Partitioned scheduling and utilization bound

    IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

    (2018)
  • D. Maxim, R. Davis, L. Cucu-Grosjean, A. Easwaran, Probabilistic analysis for mixed criticality systems using fixed...
  • R. Davis, S. Altmeyer, S. Burns, Mixed criticality systems with varying context switch costs, in: Proceedings of the...
  • S. Kang, H. Yang, S. Kim, I. Bacivarov, S. Ha, L. Thiele, Static mapping of mixed-critical applications for...
  • CaoK. et al.

    A survey of optimization techniques for thermal-aware 3d processors

    J. Syst. Arch.

    (2019)
  • HuangP. et al.

    On the Scheduling of Fault-Tolerant Mixed-Criticality SystemsTechnical Report 351, ETH Zurich, Laboratory TIK

    (2013)
  • LinJ. et al.

    Scheduling mixed-criticality real-time tasks in a fault-tolerant system

    Int. J. Embedded Real-Time Commun. Syst.

    (2015)
  • L. Zeng, P. Huang, L. Thiele, Towards the design of fault-tolerant mixed-criticality systems on multicores, in:...
  • Z. Al-bayati, J. Caplan, B. Meyer, H. Zeng, A four-mode model for efficient fault-tolerant mixed-criticality systems,...
  • ZhouJ. et al.

    Fault-tolerant task scheduling for mixed-criticality real-time systems

    J. Circuits Syst. Comput.

    (2017)
  • TaherinA. et al.

    Reliability-aware energy management in mixed-criticality systems

    IEEE Trans. Sustain. Comput.

    (2018)
  • S. Baruah, H. Li, L. Stougie, Towards the design of certifiable mixed-criticality systems, in: Proceedings of the IEEE...
  • HuangP. et al.

    Service Adaptions for Mixed-Criticality SystemsTechnical Report 350, ETH Zurich, Laboratory TIK

    (2014)
  • SkadronK. et al.

    Temperature-aware microarchitecture: Modeling and implementation

    ACM Trans. Archit. Code Optim.

    (2004)
  • HuangH. et al.

    Throughput maximization for periodic real-time systems under the maximal temperature constraint

    ACM Trans. Embedded Comput. Syst.

    (2014)
  • S. Saha, Y. Lu, J. Deogun, Thermal-constrained energy-aware partitioning for heterogeneous multi-core multiprocessor...
  • Cited by (0)

    Kun Cao is currently pursuing the Ph.D. degree with the Department of Computer Science and Technology, East China Normal University, Shanghai, China. His current research interests are in the areas of Internet of things, cyber physical systems, 3-D ICs, and real-time embedded systems. He was a recipient of the Reviewer Award from Journal of Circuits, Systems, and Computers, in 2016.

    Guo Xu is currently pursuing the master degree with the Department of Computer Science and Technology, East China Normal University, Shanghai, China. His current research interest is in the area of power management in mobile devices.

    Junlong Zhou received the Ph.D. degree in Computer Science from East China Normal University, Shanghai, China, in 2017. He was a Visiting Scholar with the University of Notre Dame, Notre Dame, IN, USA, during 2014–2015. He is currently an Assistant Professor with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China. His research interests include real-time embedded systems, cloud computing and IoT, and cyber physical systems, where he has published more than 40 refereed papers. Dr. Zhou is an active reviewer of more than 25 international journals. He received the Reviewer Award from Journal of Circuits, Systems, and Computers, in 2016. He has served as publication chairs, publicity chairs, section chairs, and TPC members for numerous conferences. He has been an Associate Editor for the Journal of Circuits, Systems, and Computers, and serves as a Guest Editor for several special issues of ACM Transactions on Cyber–Physical Systems, IET Cyber–Physical Systems: Theory & Applications, and Elsevier Journal of Systems Architecture: EMBEDDED SOFTWARE DESIGN. He is a member of the IEEE.

    Mingsong Chen received the B.S. and M.E. degrees from Department of Computer Science and Technology, Nanjing University, Nanjing, China, in 2003 and 2006 respectively, and the Ph.D. degree in Computer Engineering from the University of Florida, Gainesville, in 2010. He is currently a full Professor with the Department of Embedded Software and Systems of East China Normal University. His research interests are in the area of design automation of cyber–physical systems, formal verification techniques and mobile cloud computing. He is a member of the IEEE.

    Tongquan Wei received his Ph.D. degree in Electrical Engineering from Michigan Technological University in 2009. He is currently an Associate Professor in the Department of Computer Science and Technology at the East China Normal University. His research interests are in the areas of Internet of Things, edge computing, cloud computing, and design automation of intelligent and CPS systems. He serves as a Regional Editor for Journal of Circuits, Systems, and Computers since 2012. He is a member of the IEEE.

    Keqin Li is a SUNY Distinguished Professor of computer science in the State University of New York. His current research interests include parallel computing and high performance computing, distributed computing, energy-efficient computing and communication, heterogeneous computing systems, cloud computing, big data computing, CPU–GPU hybrid and cooperative computing, multicore computing, storage and file systems, wireless communication networks, sensor networks, peer-to-peer file sharing systems, mobile computing, service computing, Internet of things and cyber–physical systems. He has published over 590 journal articles, book chapters, and refereed conference papers, and has received several best paper awards. He is currently serving or has served on the editorial boards of IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers, IEEE Transactions on Cloud Computing, IEEE Transactions on Services Computing, and IEEE Transactions on Sustainable Computing. He is an IEEE Fellow.

    This work is partially supported by Shanghai Municipal Natural Science Foundation under Grant 16ZR1409000, National Natural Science Foundation of China under Grants 61802185 and 61872147, Natural Science Foundation of Jiangsu Province under Grant BK20180470, and Fundamental Research Funds for the Central Universities under Grant No. 30919011233.

    1

    Fellow, IEEE.

    View full text