Increasing resource utilization in mixed-criticality systems using a polymorphic VLIW processor

https://doi.org/10.1016/j.sysarc.2018.01.003Get rights and content

Abstract

Mixed-criticality systems need to provide strict guarantees to hard real-time tasks and simultaneously, deliver high throughput for non-critical tasks. However, techniques to enhance performance more often than not affect the analyzability, e.g., caches, branch prediction, out-of-order (OoO) execution superscalar processing, and simultaneous multithreading (SMT).

In this paper, we propose the use of a polymorphic VLIW processor to increase performance for non-critical tasks while maintaining analyzability. The processor achieves these goals by dynamically distributing computing resources (in the form of datapaths) to one or multiple threads. A static schedule guarantees the minimum amount of cycles to meet the deadlines for critical tasks. Datapaths that are not used by critical tasks can be assigned to non-critical tasks in a highly flexible way, thereby increasing resource utilization resulting in higher throughput. Our experiments show that our approach can exploit its dynamic properties to improve schedulability and assign up to 50% and on average 25% more resources to lower-priority threads during the execution of a static real-time schedule.

Introduction

In modern real-time application domains, a single high-performance processor is used increasingly more often for a wide variety of tasks/workloads while having to meet restrictions regarding power, cost, size, and maintainability. The timing properties of these tasks can vary in the degree of strictness, giving rise to the field of mixed-criticality systems [1]. These systems should provide high performance but remain predictable in order to support tightly bound analyses of the worst-case execution time (WCET). Using systems with low performance (but possibly higher predictability) will limit the number of (concurrent) tasks, while systems with low predictability need to severely overestimate the WCET of its tasks to guarantee timing-safety (again limiting the number of tasks). As systems with low performance also affect the non-critical tasks, designers opt for commercial off-the-shelf (COTS) high-performance embedded platforms. These provide high performance to the lower criticality tasks during the time that the processor is not executing critical tasks. Current COTS platforms often employ heterogeneous multi-processors (HMPs), based on the ARM big.LITTLE design paradigm.

It is well-known that techniques used to improve the performance of these (superscalar) platforms, such as caches and branch prediction, have an adverse effect on the time predictability [2] causing the difference between the worst and typical execution times to increase. As the availability of these unused cycles is at the mercy of the critical tasks, they can not be trusted to be available for non-critical tasks. These non-critical tasks may not have (strict) timing requirements, but could have performance requirements, necessitating to combine them into the static real-time schedule of the critical tasks. In addition, HMPs suffer from two drawbacks. First, a tasks-cores mismatch (when Ntasks < Ncores) will result in unused cores. Second, migration of tasks from one core to another results in a penalty for saving and restoring the state (internal registers) and having to refill caches and predictors (i.e., a cold start).

In this paper, we propose to use the ρ-VEX polymorphic VLIW processor for mixed-criticality systems. It is capable of executing programs with a high level of Instruction-Level Parallelism (ILP) in a high-performance single-core 8-issue VLIW configuration, or multiple tasks or threads in a multicore configuration with smaller issue widths, allowing it to adapt to the workload. This is achieved by allowing neighboring datapaths to work independently or be combined into a single core. Independent datapaths are isolated from each other regarding performance, so they do not suffer from performance interference. The highest performing configuration provides 8 parallel datapaths, similar to the TMS320C6x family of Digital Signal Processors (DSPs) [3]. In the VLIW paradigm, all extraction of parallelism and instruction-level scheduling is pushed to the compiler. More specifically, the execution time of a ρ-VEX program is fully predictable, under the assumption that the memory subsystem that feeds the processor instructions and data is also predictable. For instance, instead of dynamic branch prediction, the compiler restructures the code after analyzing the most likely control flow. Therefore, the ρ-VEX provides a high degree of predictability without sacrificing performance for high-ILP workloads.

Moreover, in contrast to HMPs, there is no migration penalty associated with changing the power-performance trade-off point of a task in the ρ-VEX. In order to support the multicore operating modes, the ρ-VEX register file is capable of storing four separate task/thread states (virtual cores or SMT contexts) simultaneously. This also allows it to rapidly switch between programs when running in 8-way single-core mode, as long as no more than four tasks/threads are contending for processor resources. The register file design is similar to the Qualcomm Hexagon DSP processor present in current mobile chipsets [4].

In [5], the benefits of using VLIW-based polymorphism for static scheduling of real-time task graphs has been examined. This work extends upon it by providing a system architecture for mixed-criticality workloads and evaluating the advantages in terms of throughput for the non-critical tasks in the workload. In this paper, we extend our earlier work on real-time schedulability and performance on the ρ-VEX [5]. The contributions of this work are:

  • We propose to use VLIW-based polymorphism as a basis to design systems for mixed-criticality workloads.

  • We discuss how these systems are able to provide both temporal and spatial isolation.

  • We perform an evaluation using the ρ-VEX proof-of-concept in terms of throughput using task graphs generated from the Mälardalen real-time benchmark suite.

  • We show that the polymorphic VLIW is able to assign up to 50% more resources to non-critical tasks during execution of a static real-time schedule compared to heterogeneous processors with equal computational resources.

The remainder of this paper is structured as follows. Section 2 introduces the execution platform and concepts necessary to understand the work. It also discusses the scheduling methodology that provides timing isolation between critical tasks while still being able to exploit processor polymorphism to increase schedulability. Section 3 presents a system architecture providing spatial and temporal isolation, and how to exploit processor polymorphism to dynamically assign execution resources to non-critical tasks when they are not utilized by critical tasks. Section 4 discusses how we use the scheduling methodology to create valid real-time schedules for the proposed platform. Section 5 presents the evaluation setup and discusses the results, Section 6 compares this work to existing literature and Section 7 concludes the work.

Section snippets

Background

This section introduces a number of concepts from earlier work necessary to understand this work. The polymorphic processing platform used for our evaluations will be discussed first. Subsequently, the scheduling methodology that we will use to schedule real-time workloads for this dynamic processor will be discussed.

System architecture for mixed-criticality systems

In this section, we describe how we use the adaptable processor combined with the scheduling methodology discussed in Section 2 to create a system architecture that provides temporal and spatial isolation for critical tasks and is able to provide high throughput for non-critical tasks.

Scheduling approach

This section discusses how the scheduling methodology presented in Section 2.2 is used to create valid static schedules for the proposed platform. We then propose two ways in which performance of one of the tasks can be improved beyond merely meeting its deadline in the worst case.

Experiments and evaluations

This section presents the experimental setup and measurement results in two evaluation metrics: schedulability and resource utilization.

Related work

This work discusses (static real-time) schedulability, multi-threaded architectures, and time-predictable processors. In [16] and [17], predictability and schedulability is discussed. Examples of processors with multiple contexts/threads for the purpose of real-time systems are [18], [19]. In [20], performance comparisons are made between increasing the number of cores and increasing the number of register sets. A related VLIW architecture that has multiple hardware contexts is the Itanium [21]

Conclusions

This paper evaluates the ρ-VEX polymorphic processor in the field of mixed-criticality systems. We showed that it can exploit its dynamic properties to improve schedulability and assign up to 50% and on average 25% more resources to lower-priority threads during the execution of a static real-time schedule. The resulting increase in throughput depends on the tasks characteristics such as ILP. The nature of VLIW architectures provides a high degree of predictability as it uses static branch

Acknowledgement

This work has been supported by the ALMARVI European Artemis project nr. 621439.

Joost Hoozemans Received his BSc in Computer Science from Utrecht University in 2011 and his MSc in Computer Engineering from Delft University of Technology in 2014. His work mainly focuses on Operating System support and configuration scheduling for dynamic VLIW processors.

References (32)

  • S. Baruah et al.

    Towards the design of certifiable mixed-criticality systems

    2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium

    (2010)
  • L. Thiele et al.

    Design for timing predictability

    Real Time Syst.

    (2004)
  • D. TMS320C66x, Cpu and instruction set reference guide, november 2010, Texas Instruments Literature Number:...
  • L. Codrescu et al.

    Hexagon DSP: an architecture optimized for mobile multimedia and communications

    IEEE Micro

    (2014)
  • J. Hoozemans et al.

    Using a polymorphic VLIW processor to improve schedulability and performance for mixed-criticality systems

    Proc. 23rd IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (in press), Hsinchu, Taiwan

    (2017)
  • S. Wong et al.

    The delft reconfigurable VLIW processor

    Proc. 17th International Conference on Advanced Computing and Communications, Bangalore, India

    (2009)
  • J.A. Fisher et al.

    Embedded Computing: A VLIW Approach to Architecture, Compilers, and Tools

    (2005)
  • P. Faraboschi et al.

    Lx: a technology platform for customizable VLIW embedded processing

    Computer Architecture, 2000. Proceedings of the 27th International Symposium on

    (2000)
  • A. Brandon, J. Hoozemans, J. van Straten, S. Wong, Exploring ILP and TLP on a Polymorphic VLIW Processor, Springer...
  • A. El-Haj-Mahmoud et al.

    Virtual multiprocessor: an analyzable, high-performance architecture for real-time computing

    Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems

    (2005)
  • T. Sherwood et al.

    Discovering and exploiting program phases

    IEEE Micro

    (2003)
  • M. Zimmer et al.

    FlexPRET: a processor platform for mixed-criticality systems

    2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS)

    (2014)
  • A. Hansson et al.

    Compsoc: a template for composable and predictable multi-processor system on chips

    ACM Trans. Des. Autom. Electr. Syst.

    (2009)
  • J. Gustafsson et al.

    The Mälardalen WCET Benchmarks – Past, Present and Future

    (2010)
  • J. Yan et al.

    A time-predictable VLIW processor and its compiler support

    Real Time Syst.

    (2008)
  • J.A. Stankovic et al.

    What is predictability for real-time systems?

    Real Time Syst.

    (1990)
  • Cited by (0)

    Joost Hoozemans Received his BSc in Computer Science from Utrecht University in 2011 and his MSc in Computer Engineering from Delft University of Technology in 2014. His work mainly focuses on Operating System support and configuration scheduling for dynamic VLIW processors.

    Jeroen van Straten was born in Zoetermeer, The Netherlands on June 16th, 1991. He obtained his MSc. in Computer Engineering from the Delft University of Technology in May 2016 for his work on the implementation of a dynamic VLIW processor.

    Stephan Wong was born in Paramaribo, Suriname on October 20th, 1973. He obtained his Ph.D. from the Delft University of Technology in 2002 after which he started as an assistant professor at the same university. His research interests include: Reconfigurable Computing, Distributed Collaborative Computing, High-Performance Computing, Embedded Systems, Hardware/Software Co-Design, Network Processing.

    View full text