Energy minimization for on-line real-time scheduling with reliability awareness

https://doi.org/10.1016/j.jss.2017.02.004Get rights and content

Highlights

  • Proposed closed form formulas to quantify static and dynamic system reliability.

  • Proposed an energy efficient on-line algorithm under reliability constraint.

  • Theoretically analyzed the performance of the proposed algorithm.

  • The proposed algorithms significantly outperform the existing related work.

Abstract

Under current development of semiconductor technology, there is an exponential increase in transistor density on a single processing chip. This aggressive transistor integration significantly boosts the computing performance. However, it also results in a power explosion, which immediately decreases the system reliability. Moreover, some well-known power/energy reduction techniques, i.e. Dynamic Voltage and Frequency Scaling (DVFS), can cause adverse impact on system reliability. How to effectively manage the power/energy consumption, meanwhile keep the system reliability under control, is critical for the design of high performance computing systems. In this paper, we present an online power management approach to minimize the energy consumption for single processor real-time scheduling under reliability constraint. We formally prove that the proposed algorithm can guarantee the system reliability requirement. Our simulation results show that, by exploiting the run-time dynamics, the proposed approach can achieve more energy savings over previous work under reliability constraint.

Introduction

Embedded computing systems have got a rapid growth in both scale and complexity in the last decade. This advancement is mainly rooted in the development of transistor scaling technology. Today, hundreds of billions of transistors can be integrated into a single chip, which directly results in a boost in the computing performance. However, one key problem, as a consequent of the aggressive scaling in the transistor size, is the huge amount of power increase within a single processing chip. The increased power consumption further poses severe constraints on both design and implementation of computing systems.

Real-time embedded systems, as one type of embedded systems that is dedicated to special applications with real-time constraints in an embedded environment, have been used in a wide range in our daily life. They can be easily found in mobile phones, electronic game devices, motor vehicles, medical equipments, etc. Take mobile phones as an example, these devices have essential restrictions on size, weight, thermal and power/energy. Power/energy is particularly important, as these portable devices largely depend upon the battery-life to deliver high performance and service quality (Zhang et al., 2009). Although computing performance has been continuously increased until today, power/energy issue is more critical in the design of real-time embedded systems.

Dynamic voltage and frequency scaling (DVFS) is one of the most commonly used techniques for power/energy management, which has been well studied in Jejurikar and Gupta (2004) and Yao et al. (1995). With DVFS enabled processors, the supply voltage and frequency are lowered at run-time to achieve energy savings. However, as shown in studies (Zhu, Aydin, 2006, Zhu, Melhem, Mosse, 2004), DVFS can adversely affect the system reliability. That means, reducing the voltage and operating frequency of a processor exacerbates the reliability problem. For example, it is reported that the transient fault rate occurred in a processor usually increases in several orders of magnitude under low power/frequency condition. Transient fault refers to the temporary malfunction of a processor, usually caused by electromagnetic interferences or cosmic ray radiations, that can lead to temporary errors in computation and corruptions in data (Srinivasan, Adve, Bose, Rivers, 2004, Shivakumar, Kistler, Keckler, Burger, Alvisi, 2002, Ernst, Das, Lee, Blaauw, Austin, Mudge, Kim, Flautner, 2004). Moreover, with the increased complexities in both system architecture and applications, the reliability issue is becoming more challenging. As we can see, appropriate real-time scheduling strategies, particularly for embedded system, are desired.

Several researches have been published on reliability-aware power/energy management for real-time embedded systems (Zhu, Aydin, 2006, Zhu, Melhem, Mosse, 2004, Zhao, Aydin, Zhu, 2011, Baoxian Zhao, Zhu, 2009, Han, Fan, Bai, Ren, Quan, 2016, Zheng, Gao, Zhu, Gupta, 2015, Shah, Sundmark, Lindstrm, Andler, 2016). Zhu et al. (2004) proposed a new fault rate model by considering the frequency effects as well as the execution time together. Zhu and Aydin (2006) applied this fault model to manage the voltage and frequency by reserving backup blocks for specific tasks such that the energy could be minimized and the reliability requirement could be satisfied. Baoxian Zhao and Zhu (2009) further improved this approach by reserving processor resources that can be shared by multiple tasks. Han et al. (2016) presented an approach to pinpoint the peak temperature of a given periodic multicore DVFS schedule. All these approaches suffer a common drawback that task speed assignments are determined statically. In other words, the frequency assignment is predetermined and no run-time information is taken into account.

It is a well-known factor that, in real-time systems, there is usually a large difference between the worst case and the best case execution time for the same real-time task. Therefore the approach that can take advantage of the run-time dynamics can be very effective in saving energy. As a result, we want to study how to employ on-line scheduling techniques to save energy without degrading the system reliability. Specifically, in this paper, we present an on-line reliability-aware dynamic power management approach to schedule frame-based real-time tasks (which share the same deadline but with different execution time) on a single processor platform. The proposed algorithm reduces the energy consumption by dynamically recycling the redundant resources, and based on which, readjusting the frequency for the rest of the workload. Compared with the existing work, we have made a number of contributions:

  • First, we made an interesting observation that the system reliability varies with the executions of real-time tasks. By taking the system on-line property into consideration, instead of guaranteeing the system original reliability through off-line approach, we satisfy the reliability requirement through on-line approach. To our best knowledge, this is the first paper that considers the system reliability from on-line perspective.

  • Secondly, by recycling the preserved computing resource dynamically, our proposed algorithm can effectively exploit the run-time slacks to adjust the frequencies of real-time tasks such that the system energy consumption can be minimized without compromising the system reliability.

  • Thirdly, we conducted extensive experiments to study the performance of our approach, and our experimental results demonstrate that our proposed algorithm can significantly reduce the energy consumption compared with the previous work.

The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 introduces the preliminary necessary for this paper. Section 4 motivates this research with an example, then formulates the research problem. Section 5 presents the proposed reliability-aware dynamic power management algorithm. Experiments and results are discussed in Section 6, and the conclusion of this work is presented in Section 7.

Section snippets

Related work

Many researchers have proposed approaches to dealing with energy related problems with consideration of fault tolerance. Elnozahy et al. (2002) derived a simple theory for power management in the context of duplex and triple modular redundancy systems. In their approach, the recovery back up blocks were used to reserve sufficient time to recover the duplex system from one fault. Unsal et al. (2002) proposed an energy-aware fault-tolerance heuristic, through which the backup tasks were postponed

Preliminary

In this section, we first introduce the system models used in this paper, which include task model, fault model, reliability model, power and energy models. Then we use an example to motivate our research.

Motivational example and problem formulation

In this section, we first use an example to motivate our research problem, and then formulate our research problem.

Reliability-aware dynamic power management algorithm

The Reliability-Aware Dynamic Power Management (RA-DPM) algorithm is an on-line power management algorithm that dynamically adjusts the tasks’ frequencies by utilizing potential redundant recovery blocks on-line. The system original reliability obtained by Eq. (3) can be guaranteed, and the original recovery requirement can be satisfied. To this end, we first introduce the system on-line reliability, to estimate the system reliability dynamically, then discuss our new algorithm and analyze the

Experimental evaluation

In this section, we investigate the performance of our proposed algorithms with experimental simulations. We compare RA-DPM with two most recent reliability-aware power/energy management algorithms, i.e. the IRCS (Zhao et al., 2011) and the RAPM (Zhu and Aydin, 2006). Four schemes are evaluated in our simulation.

  • The IRCS scheme which is a search algorithm that assigns frequencies iteratively to tasks based on their energy-reliability ratio (ERR) values denoting the energy savings per unit

Conclusions

In this paper, we presented and evaluated a novel reliability aware power management algorithm called RA-DPM, which aims to minimize energy consumption while keeping the system’s reliability at a desired level. RA-DPM dynamically adjusts the tasks’ frequencies by utilizing run-time slacks which may be increased through recycling redundant recovery blocks. It differs from the existing works where task frequencies assignments are predetermined, therefore it is more flexible and adaptive.

The

Dr. Ming Fan is an R&D engineer in Broadcom Limited corporation. He received his Ph.D. from the Department of Electrical and Computer Engineering at Florida International University, Florida, USA, in 2014. He received both B.S. and M.S. from the Department of Software Engineering at Beihang University, Beijing, China, in 2006 and 2009, respectively. His research interests include real-time systems, power-/thermal-aware computing, fault-tolerant system design, computer network, and IC design and

References (27)

  • Z. Li et al.

    Energy minimization for reliability-guaranteed real-time applications using {DVFS} and checkpointing techniques

    J. Syst. Archit.

    (2015)
  • L. Niu et al.

    Reliability-aware energy minimization for real-time embedded systems with window-constraints

    SIGBED Rev.

    (2013)
  • H.A. Baoxian Zhao et al.

    Enhanced reliability-aware power management through shared recovery technique

    Computer-Aided Design - Digest of Technical Papers, 2009. ICCAD 2009. IEEE/ACM International Conference on

    (2009)
  • T. Burd et al.

    Energy efficient cmos microprocessor design

    System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on

    (1995)
  • X. Castillo et al.

    Derivation and calibration of a transient error reliability model

    IEEE Trans. Comput.

    (1982)
  • E. Elnozahy et al.

    Energy-efficient duplex and tmr real-time systems

    Real-Time Systems Symposium, 2002. RTSS 2002. 23rd IEEE

    (2002)
  • D. Ernst et al.

    Razor: circuit-level correction of timing errors for low-power operation

    Micro IEEE

    (2004)
  • Q. Han et al.

    Temperature-constrained feasibility analysis for multi-core scheduling

    IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.

    (2016)
  • Q. Han et al.

    Energy minimization for fault tolerant real-time applications on multiprocessor platforms using checkpointing

    International Symposium on Low Power Electronics and Design (ISLPED)

    (2013)
  • R. Jejurikar et al.

    Dynamic voltage scaling for systemwide energy minimization in real-time embedded systems

    Low Power Electronics and Design, 2004. ISLPED ’04. Proceedings of the 2004 International Symposium on

    (2004)
  • Z. Li et al.

    Reliability guaranteed energy minimization on mixed-criticality systems

    J. Syst. Softw.

    (2016)
  • S.M.A. Shah et al.

    Robustness testing of embedded software systems: an industrial interview study

    IEEE Access

    (2016)
  • Cited by (0)

    Dr. Ming Fan is an R&D engineer in Broadcom Limited corporation. He received his Ph.D. from the Department of Electrical and Computer Engineering at Florida International University, Florida, USA, in 2014. He received both B.S. and M.S. from the Department of Software Engineering at Beihang University, Beijing, China, in 2006 and 2009, respectively. His research interests include real-time systems, power-/thermal-aware computing, fault-tolerant system design, computer network, and IC design and verification.

    Dr. Qiushi Han is an R&D engineer in Broadcom Limited corporation. He received his Ph.D. from the Department of Electrical and Computer Engineering at the Florida International University, Florida, USA. He received his B.S. from the Department of Software Engineering, Beijing Jiaotong University. His research interests include real-time systems, power-/thermal- aware computing and reliable/fault-tolerant system designs.

    Dr. Xiaokun Yang is an Assistant Professor at University of Houston Clear Lake. He received his Ph.D. at FIU with a specialization in VLSI field in May 2016, and his dual M.S. degree in electrical and computing engineering at Florida International University (FIU), USA, and Beihang University, China, in 2007.

    View full text