Scenario-based quasi-static task mapping and scheduling for temperature-efficient MPSoC design under process variation

https://doi.org/10.1016/j.micpro.2014.05.006Get rights and content

Abstract

Nowadays, employing the worst case analysis is the most common approach to provide unified static task mapping–scheduling plans on MPSoCs. Since the whole design space nor a subset of design space are not explored in the worst case methods, these approaches may fail to achieve efficient performance yield. In this paper, we present a temperature-aware quasi-static task mapping–scheduling framework under process variation for hard real-time and periodic systems on MPSoCs. By employing the stochastic optimization and scenario-based approaches, we explore a few representative scenarios in the whole design space of the chip using the probability density function of the problem random variables. Then, we obtain a compact set of near optimal mapping–scheduling of real-time tasks which targets performance-yield maximization and minimization of the expected values of peak temperature. Consequently, considering different chip parameter configurations, we construct the plan set as the solutions that attain the best variation-aware task mapping–scheduling that satisfy the deadline and minimize the temperature. This plan set can readily look up at run time by the system scheduler of the chip to find the proper plan of the tasks based on the run-time parameters. The experimental results demonstrate significant improvements in performance-yield and peak temperature for almost all of the test cases off homogenous and heterogeneous MPSoCs.

Introduction

Technology scaling of transistor features enables the integration of multiple heterogeneous processors on a single die, known as Multi-Processor Systems on Chip (MPSoC). This aggressive scaling may lead to higher power density and temperature and in turn causes a localized high temperature region known as thermal hot spots. Thermal hot spots have adversarial effects on the operations of the chip such as:

  • Decreasing the reliability: Failure scenarios such as Electro-migration and Hot carrier Injection increase by growing thermal hot spots [1]. According to [2], a change of operating temperature by 10–15 °C results in a 2X difference in the lifespan of devices. Moreover, the thermal expansion, especially at hot spots, may lead to uneven chip expansion and as a result potentially breaks the chip physically.

  • Increasing power consumption: Temperature induces a positive feedback with leakage power consumption. This leakage may account for up to 60% of the total energy consumption in deep sub-micron technologies [3].

Consequently, temperature has become an issue of paramount value due to its direct or indirect role in power consumption, reliability, and cooling cost, so it should be considered at the early phase of the design of MPSoCs.

On the other hand, continued scaling of deep sub-micron technologies reveals some new challenges such as soft errors, device aging and process variation [4]. Herein, process variation exacerbates chip design process by imposing uncertainty in key parameters of the transistor such as channel length, gate-oxide thickness, and threshold voltage. Despite several works [5], [6] are devoted to hide system-level process variation in logic- and arithmetic-level, but, Borker et al. [7], and Miranda et al. [8] report up to 30% and 40% variation in frequency of processors fabricated at 180nm and 32nm technologies, respectively. This trend demonstrates that resolving this issue by logic-level and arithmetic-level techniques is not sufficient. Thus, process variation and subsequently higher variability in key parameters of fabricated chip is a matter of great importance yet, since it may significantly impacts not only the performance-yield but also the temperature and power consumption in highly scaled chips.

Thus, efficient design of variation-aware task mapping–scheduling with hard real-time constraint, which keeps the peak-temperature below a threshold, could be a formidable problem. It is known that system level design has two main steps. In the first step, given application is partitioned into hardware and software tasks, based on the system constraints. Then, hardware and software tasks are implemented on ASICs or FPGAs, and General Purpose Processors, DSPs or Accelerators, respectively. In the second step, system tasks are assigned to the available processing elements to optimally utilize the system resources. In addition, the tasks are scheduled so as to satisfy the system constraints such as deadline and power consumption [9].

Various approaches have been proposed for mapping–scheduling of tasks that may classify into three categories: static, quasi-static, and dynamic methods. In static methods, decisions are made at “design time” and a unique mapping–scheduling is obtained based on representative and reliable models of target hardware architecture and application tasks. Usually in static approaches the average or the worst case value of the parameters is used as a summarization of the distribution function of process variation. Although proposing design schemes based on worst case values for all chip parameters is a safe approach, but it is so conservative and results in over provisioning and thus is not cost-effective. Additionally, design schemes based on average values could be unreliable when facing dynamics in parameters. So, both these approaches may fail to find solutions for some cases that require searching the whole design space. Consequently, static methods may be inappropriate candidate for the fabricated chip and the effectiveness of these methods may be questionable when encountering with increased amount of process variation.

In quasi-static methods, more than one mapping–scheduling plan will be generated at design time and then at runtime, one of them is selected based on actual parameters of fabricated chip. This method has the advantage that tradeoffs accuracy for complexity and attains a better solution which is near optimal in many cases. Finally, in dynamic method, mapping–scheduling decision is postponed to the runtime phase and based on the features of the actual chip; mapping–scheduling of tasks is carried out. Clearly in dynamic method finding an optimal solution except for simple tasks may become difficult, if not impossible. Fortunately, there exist some studies in stochastic optimization known as scenario-based optimization that attempts to find more effective solution under uncertainty of some variables. It is noteworthy to mention that the term event in the literature [11] has the same meaning as scenario in this paper and thus in what follows we use scenario throughout the paper. Using scenario-based method may suffice to explore offline a proper subset of the design space while the solution is close enough to the optimal solution that requires exploring the whole space and thus tradeoffs accuracy for practicality.

Based on scenario-based stochastic optimization in this paper, we present a hierarchical and statistical temperature-aware quasi-static task mapping–scheduling framework under process variation for hard real-time applications on MPSoCs. While most prior works address improving performance-yield and peak-temperature separately, to our best knowledge, this is the first work that considers process variation effects on frequency and leakage power when generating temperature-aware task mapping–scheduling plan.1 Using the five step framework, at design phase, we first find a set of plans that attempts to maximize performance-yield and minimize the expected values of peak-temperature, simultaneously. Furthermore, at run time, based on actual variation of chip that could be obtained by mature speed binning techniques along with information of selected plans, the system scheduler selects a proper plans among the whole plans.

Also, it is worth to mention that wear-out effect is an important issue since it changes the chips parameters and in turn changes the frequency map of MPSoC. The proposed framework addresses this issue as well and it suffices to watch over the parameters variation due to wear-out intermittently and look up the suitable plan in the preconfigured plan set based on the new parameters of the chip. Moreover, we note that our framework could be easily configured for large systems using some design parameters. To reduce the execution time of the framework, a random weighted scenario selection algorithm is used. Moreover, the framework is independent of the temperature-aware scheduling algorithm and any other algorithm can be replaced with our algorithm. Finally, we extensively evaluate the proposed task mapping and scheduling framework on 4 cores MPSoC, using E3S benchmarks suite. Results clearly demonstrate the enhancement compared to baseline approaches.

The remainder of this paper is organized as follows. In Section 2 we review some related works. Section 3 covers the preliminaries and the problem definition. Our motivating example is explained in Section 4. The proposed methodology is described in Section 5. Section 6 discusses the effects of variation in ambient temperature and the complexity of the framework. Experimental results are provided in Section 7. Finally, Section 8 concludes the paper and mentions some future directions.

Section snippets

Related work

Multi-processor task mapping–scheduling problems have been proven to be an NP-Complete problem [14] and it becomes worse under process variation. Many heuristic solutions have been proposed to address this problem in the literature and a few considers process variation [10], [11], [12]. New design metric called performance-yield was introduced in [10] and indicates the probability that the design meets the timing constraints. In their work, the authors used statistical analysis on task graphs

Problem and model definition

In this section, we first describe briefly definitions and models used in this paper, including process variation model, frequency and leakage power and thermal models, scenario-based stochastic method and system scenario model. Finally, in the last subsection, we introduce the main problem targeted by this study.

Motivating examples

Before discussing the proposed framework of task mapping–scheduling problem we wish to investigate the effectiveness of doing the laborious approach of dealing with the whole distribution function of the parameters variation compared to do the simpler job of targeting only summarization methods (e.g. average case or worst case) in dealing with process variation. To do so, we provide two motivating examples. In the following we compare the proposed method with the traditional worst case-based

Quasi-static task mapping–scheduling framework

In this section, we explain the temperature-aware quasi-static task mapping–scheduling framework under process variation. A brief overview of the proposed framework that consist five steps is portrayed in Fig. 6. In the first step (named “Scenario Generation”), the reference table of the scenarios will be created. Then, in “Scenario Selection” step, based on a priority criteria, n scenarios are chosen from the scenario reference table. After that, an arbitrary temperature-aware task

Discussion

In this section, we discuss how one can mitigate the destructive effects of variation of ambient temperature on our proposed quasi-static framework. Then, it quantifies the performance of the presented framework in terms of time and space complexity measures.

Experimental results

In this section, we evaluate the efficiencies and improvements of the proposed framework. We begin with explaining the experimental setup, and continue with the evaluations.

Conclusion

Dealing with process variation has becoming an issue of increased notice in designing future high scaled chips as it causes the exponential growth of design space. In this paper we presented a scenario-based stochastic optimization framework to allocate and schedule periodic tasks with time constraints that achieves low temperature and efficient performance yield. Based on the probability distribution of chip parameters the presented framework partitions offline the design space into some

Behnam Khodabandeloo received his BSc and MSc in Computer Engineering from University of Tehran, Iran. He is currently a Research Assistant in the School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran. His research interests are applications of optimization theory and approximation algorithms in digital systems design.

References (32)

  • E.L. Lawler et al.

    Scheduling periodically occurring tasks on multiple processors

    Inf. Process. Ltrs.

    (1981)
  • JEDEC Solid State Technology Association, Failure Mechanisms and Models for Semiconductor Devices, JEDEC Publication...
  • R. Viswanath et al.

    Thermal performance challenges from silicon to systems

    Intel Technol. J.

    (2000)
  • International technology roadmap for semiconductors, 2010,...
  • S. Borkar

    Designing reliable systems from unreliable components: the challenges of transistor variability and degradation

    IEEE Micro

    (2005)
  • F. Wang, X. Wu, Y. Xie, Variability-driven module selection with joint design time optimization and post-silicon...
  • A. Agarwal, D. Blaauw, V. Zolotov, Statistical timing analysis for intra-die process variations with spatial...
  • S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De, Parameter variations and impact on circuits and...
  • M. Miranda, M. Corbalan, B. Dierickx, P. Zuber, P. Dobrovolny, F. Kutscherauer, P. Roussel, P. Poliakov, Variability...
  • S. Banerjee, N. Dutt, Efficient search space exploration for HW-SW partitioning, in: International Conference on...
  • F. Wang, C. Nicopoulos, X. Wu, Y. Xie, N. Vijaykrishnan, Varitiona-ware task allocation and scheduling for MPSoC, in:...
  • H. Chon, T. Kim, Timing variation-aware task scheduling and binding for MPSoC, in: Proc. ASP-DAC, 2009, pp....
  • L. Huang, Q. Xu, Performance yield-driven task allocation and scheduling for MPSoCs under process variation, in: Proc....
  • ...
  • G.C. Sih et al.

    A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures

    IEEE Trans. Parallel Distrib. Syst.

    (1993)
  • Y. Xie et al.

    Temperature-aware task allocation and scheduling for embedded multiprocessor systems-on-chip (MPSoC) design

    J. VLSI Signal Process.

    (2006)
  • Cited by (0)

    Behnam Khodabandeloo received his BSc and MSc in Computer Engineering from University of Tehran, Iran. He is currently a Research Assistant in the School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran. His research interests are applications of optimization theory and approximation algorithms in digital systems design.

    Ahmad Khonsari received the BSc in Electrical and Computer Engineering from Shahid-Beheshti University, Iran, in 1991, and MSc in Computer Engineering from the IUST, Iran, in 1996 and PhD in Computer Science from the University of Glasgow, UK, in 2003. He is currently an Associate Professor in the Department of Electrical and Computer Engineering, University of Tehran, Iran and a researcher in School of Computer Science, IPM, Iran. His research interests are performance modelling/evaluation, wired/wireless networks, distributed systems, and high performance computer architecture.

    Farzad Gholamian received the BSc in Electrical Engineering from the University of Science and Technology (IUST), Iran, in 2007. He is currently a MSc student in the School of Electrical and Computer Engineering of the University of Tehran, Iran. His current research interests include computer architecture and temperature aware SOC design.

    Mohammad H. Hajiesmaili received his BSc in Computer Engineering from Sharif University of Technology, Iran, in 2007, and MSc in Computer Engineering University of Tehran, Iran, in 2009. He is currently a Research Assistant in School of Computer Science, Institute for Research in Fundamental Sciences (IPM), and PhD student in University of Tehran, Iran. His research interests are applications of optimization theory in wired and wireless networks.

    Aminollah Mahabadi received his BSc degree in Electrical Engineering (Computer Hardware) from Iran Science and Technology University (Iran), in 1990, and MSc degree in Computer Engineering (Computer Architecture) from Amirkabir University (Iran), in 1996. He is currently assistant professor of Electrical and Engineering at Shahed University, Tehran, Iran. His research interests are in ITS, image processing, NoCs and SoCs, parallel and distributed systems, and simulation.

    Hamid Noori received the BSc degree from Sharif University of Technology, Iran in 1996 and the MSc degree in Computer Systems Architecture from the Amirkabir University of Technology, Iran in 1999. He received his PhD degree from Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan in 2007. During January 2009–January 2011, he has been with the School of Electrical and Computer Engineering, Faculty of Engineering, University of Tehran as an assistant professor. Currently, he is an assistant professor at the School of Electrical and Computer Engineering, Engineering Department, Ferdowsi University of Mashhad. His research interests include, customizable and reconfigurable embedded processors, multi-core processors and multi-processors systems on chip (MPSoC).

    View full text