Migration-aware adaptive MPSoC static schedules with dynamic reconfigurability

https://doi.org/10.1016/j.jpdc.2011.06.006Get rights and content

Abstract

Technology scalings in semiconductors have enabled the integration of dozens of processing elements (PEs) onto a single chip (MPSoC). Scheduling application tasks onto the target MPSoC has been widely reported in the literature. Both technology scalings and resource competitions among applications have led to the variations of availability resources at runtime. While adaptive static schedules with predictable responses to runtime resource variations have consequently been proposed, a large number of task migrations upon PE failures in this reconfigurable schedule scheme will lead to excessive migration cost among processors and performance degradation. In this paper, we present an algorithm to reduce the number of task migrations while retaining the benefits of the fore techniques. Through embedding several soft constraints into the baseline heuristic scheduling algorithm, the proposed algorithm can decrease the number of task migrations significantly on the basis of holding the advantages of the initial dynamic reconfigurable schedule scheme. The performance evaluation of the proposed technique is carried out by incorporation into a well known heuristic scheduling algorithm. The simulation results confirm its effectiveness in minimizing the number of task migrations during dynamic reconfiguration.

Highlights

► Task migration is a critical aspect for a predictable adaptive schedule. ►Two-line method is proposed for reducing the overhead of task migration. ►MNTM algorithm is proposed for minimizing the overhead of task migration. ► Highly regular task migration is reserved. ► The fullest utilization of the available hardware resources is reserved.

Introduction

Technology scalings have enabled the integration of an increasing number of processing elements (PEs) onto a single chip (MPSoC) [28]. This increasing computational power in turn supports the ability of permitting higher number of applications to be integrated and executed simultaneously on the chip. The execution environment is becoming more dynamic and unpredictable in that the amount of computational resources available to an application may vary due to either application complexity, or the degradation of hardware reliability. Technology scalings have not only increased the probability of device failures during execution [26], but also have led to higher temperatures and increased thermal gradients, both of which may cause an overheated processor to be temporarily unavailable at runtime. The variations of the resources available for an application during its execution time require new considerations for scheduling the tasks to the processing elements.

With this increasing probability of variations in processor availability, the fundamental challenge in developing an adaptive MPSoC platform is to make intelligent resource reallocation decisions with negligible overhead. The consideration of deadline constraints of embedded applications requires the reconfiguration process to be very fast, with highly predictable impact on each individual application. To fulfill these constraints, adaptive MPSoC schedules have been proposed [30]. With pre-optimized reconfiguration decisions embedded, such schedules deliver predictable responses upon runtime resource variations, with no need for any rescheduling decisions on the fly.

When an adaptive schedule is put in place, one important aspect needed to be considered is task migration. Task migration increases potential inter-processor communications, which will in turn result in a possible increase of schedule execution time. At the same time, tasks transferring from one PE to another also increase the cost of memory accesses, which will inevitably affect the task execution time. The performance overhead introduced by task migration often lead to higher energy consumption and timing violations. Higher energy consumption will increase the probability of device failures due to higher temperatures or increased thermal gradients. Timing violations between dependent tasks may cause the tasks to miss their deadline. With the advent of MPSoC, task migration has always been a critical topic both in research and in product development because the distinctive features of single-chip multiprocessors pose new challenges to the implementation [3]. The detailed research articles can be found in [3], [12], [5]. Technology scalings provide an integration capacity of billions of transistors, but such a large density of transistors integrated on a single chip leads to higher power consumption as well as higher temperatures [15]. Higher temperatures, in turn, present a significant challenge for reliability, especially PE failures due to overheating at runtime. Tasks overloaded on a PE may cause an overheating processer, thus leading to the processor being temporarily unavailable during execution, as a result of clock gating. In order to avoid this unreliability, load-balancing should be considered during scheduling.

In this paper, we present a reconfigurable scheduling technique which also minimizes the number of task migrations and balances the workload of PEs while retaining the benefits of adaptive static schedules. Through embedding several soft constraints into the baseline heuristic scheduling algorithm, the proposed technique can decrease the maximum and average number of task migrations significantly on the basis of holding the advantages of the initial reconfigurable schedule scheme. The performance evaluation of the proposed technique is carried out by incorporation into a well known heuristic scheduling algorithm, and the algorithmic implementation results confirm its effectiveness in the context of a single processor deallocation.

The remainder of the paper is organized as follows. Section 2 presents the related works. Section 3 describes the proposed migration-aware static schedules as well as two-line method of task migration. We also analyze all possible dependencies among tasks under the two-line condition in this section. Section 4 introduces the details of proposed algorithm. Section 5 presents the experiment results. Section 6 concludes the paper.

Section snippets

Related work

Embedded systems, such as MARS [17] and XBW [8], use static scheduling to ensure timing predictability. Without considering fault tolerance, many task scheduling algorithms are proposed for parallel systems [2], [1], [9], [22], [18], [20], [19], [24], [27], [21]. Considering transient fault tolerance, researchers have proposed techniques to generate static schedules [16], [23]. Sufficient slack is added to a schedule so that upon the detection of a transient fault, recovery can be carried out

Migration-aware scheduling

This section presents the proposed technique that minimizes the number of task migrations. First, we describe the models of the target machine and task graphs. Then we introduce the technique presented in [30] briefly, followed by presenting the main idea of minimizing task migration, namely the two-line method of task migration in this paper. We call the BB technique [30] the one-line method of task migration in this paper. In order to analyze the problem clearly, an idealistic case is first

Algorithm

In this section, we present a new scheduling algorithm, the Minimum Number of Task Migrations (MNTM), which can further reduce the number of task migrations based on the proposed two-line method. Given an application represented as a weighted DAG [30], the scheduling problem can be formulated as the association of a start time and a processor with each node of the DAG. A typical list scheduling algorithm often includes a task prioritization phase and a processor selection phase. The priorities

Experiment results

We implemented the proposed MNTM algorithm, the baseline FAST algorithm [22], and the BB reconfiguration scheme in C++. This section presents a series of performance comparisons of all three algorithms. The experiment task graph set contains task graphs representing various parallel algorithms, such as Laplace equation solver, Fourier Transformation, and LU decomposition. Meanwhile, TGFF tool [10] is used to generate 200 random task graphs in order to represent a large spectrum of parallel

Conclusion

In this paper, we present a reconfigurable scheduling algorithm. It considers both the overhead on the reconfigurable schedule length and the task migration among different PEs at runtime due to either unpredictable device failures or thermal stress. Through adding a second line to the initial BB reconfigurable scheme and proposing a new scheduling algorithm, we can decrease both the maximum and average numbers of task migrations significantly in the case of PEs’ deallocation during runtime at

Acknowledgment

This work is partially supported by a grant from City University of Hong Kong [7002584] and by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China [Project No. CityU 123609].

Yuping Zhang received the B.E. and the M.S. degrees from the School of Computering, Wuhan University, Wuhan, Hubei Province, in 1992 and 1997, respectively. She is currently working toward the Ph. D. degree at the School of Computering, Wuhan University, Wuhan.

She has worked as a researcher at Wuhan University for over ten years. Her research interests include computer architecture, parallel and distributed computing, etc.

References (30)

  • T. Davidovic et al.

    Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems

    Computer and Operations Research

    (2006)
  • Y.-K. Kwok et al.

    Benchmarking and comparison of the task graph scheduling algorithms

    Journal of Parallel and Distributed Computing

    (1999)
  • I. Ahmad

    Cluster computing: A glance at recent events

    IEEE Concurrency

    (2000)
  • I. Ahmad et al.

    On exploiting task duplication in parallel program scheduling

    IEEE Transactions on Parallel and Distributed Systems

    (1998)
  • S. Bertozzi et al.

    Supporting task migration in multi-processor systems-on-chip: A feasibility study

  • F. Bower et al.

    Online diagnosis of hard faults in microprocessors

    ACM Transactions on Architecture and Code Optimization

    (2007)
  • E.W. Briao, D. Barcelos, F. Wronski, F.R. Wagner, Impact of task migration in noc-based MPSoCs for soft real-time...
  • S. Chabridon et al.

    Failure detect ion algorithms for a reliable execution of parallel programs

  • M. Chean et al.

    The full-use-of-suitable-spares (FUSS) approach to hardware reconfiguration for fault-tolerant processor arrays

    IEEE Transactions on Computers

    (1990)
  • V. Claesson et al.

    The XBW model for dependable real-time systems

  • R.P. Dick et al.

    TGFF: Task graphs for free

  • Y. Ding et al.

    A helper thread based EDP reduction scheme for adapting application execution in CMPS

  • L. Gantel et al.

    Multiprocessor task migration implementation in a reconfigurable platform

  • S. Ghosh et al.

    Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems

    IEEE Transactions on Parallel and Distributed Systems

    (1997)
  • C. Gong et al.

    Loop transformations for fault detection in regular loops on massively parallel systems

    IEEE Transactions on Parallel and Distributed Systems

    (1996)
  • Cited by (1)

    Yuping Zhang received the B.E. and the M.S. degrees from the School of Computering, Wuhan University, Wuhan, Hubei Province, in 1992 and 1997, respectively. She is currently working toward the Ph. D. degree at the School of Computering, Wuhan University, Wuhan.

    She has worked as a researcher at Wuhan University for over ten years. Her research interests include computer architecture, parallel and distributed computing, etc.

    Chun Jason Xue received the B.S. degree in Computer Science and Engineering from the University of Texas, Arlington, in 1997 and the M.S. and Ph.D. degree in Computer Science from the University of Texas, Dallas, in 2002 and 2007, respectively.

    He is now an assistant professor in the Department of Computer Science at the City University of Hong Kong. His research interests include memory and parallelism optimization for embedded systems, software/hardware co-design, real-time systems, and computer security.

    Chengmo Yang received her B.S. degree in Microelectronics from Peking University, China in 2003, and her M.S. and Ph.D. degrees in Computer Engineering from the University of California, San Diego in 2005 and 2010, respectively. She is currently an assistant professor in the Department of Electrical and Computer Engineering at the University of Delaware. Her research interests include reliability and adaptivity enhancement in multi and many core systems, power- and thermal-aware system design, as well as compiler-directed optimizations for embedded processors.

    Alex Orailoglu received his S.B. degree cum laude from Harvard College in Applied Mathematics and his M.S. and Ph.D. degrees in Computer Science from the University of Illinois, Urbana-Champaign. He is currently a professor of Computer Science and Engineering at the University of California, San Diego, where he leads the ART (Architecture, Reliability and Test) lab, focusing on Computer Architecture, Reliability, Embedded Processors and Systems, VLSI Test and NanoArchitectures. He has published over 250 papers and participates in the program, organizing and steering committees of major conferences in Embedded Systems, VLSI Test and NanoArchitectures.

    Dr. Orailoglu has served as the General Chair and the Program Chair for the IEEE/ACM/IFIP International Symposium on Hardware/Software Codesign and System Synthesis, the IEEE VLSI Test Symposium (VTS), the IEEE Symposium on Application Specific Processors (SASP), the IEEE/ACM International Symposium on NanoScale Architectures (NanoArch), and the IEEE International High Level Design Validation and Test Workshop. He has co-founded the IEEE Symposium on Application Specific Processors (SASP), the IEEE/ACM International Symposium on NanoScale Architectures (NanoArch), the IEEE International High Level Design Validation and Test Workshop and the HiPEAC Workshop on Design for Reliability. He chairs the IEEE Computer Society Task Force on Embedded Systems and is the Vice-Chair of the IEEE Computer Society Task Force on NanoArchitectures. He also co-chairs the ACM SIGDA TC on Nanoelectronics/Nanotechnologies. He has served as an IEEE Computer Society Distinguished Lecturer. He is a Golden Core member of the IEEE Computer Society.

    View full text