Dynamic self-reconfiguration of a MIPS-based soft-core processor architecture

https://doi.org/10.1016/j.jpdc.2017.09.013Get rights and content

Highlights

  • New dynamic partial reconfiguration technique.

  • On-demand self-reconfigurable processor architecture.

  • Efficiency-based replacement strategies for scheduling the reconfigurations.

  • Arithmetic library with interchangeable software and hardware functions.

  • Transparent automatic software-to-hardware substitution of arithmetic functions.

Abstract

The rising demands for computational performance are a permanent trend in our increasingly digital world. Consistently addressing this trend poses a challenge for every embedded processor system. This paper proposes the use of reconfigurable processor architectures to increase “on demand” processing performance while running a specific target application. The reconfiguration is used to interchange specialized co-processors attached to a static soft-core processor during run-time. Different self-optimization software–hardware substitution mechanisms, inspired by the field of organic computing, are implemented and evaluated using two different synthetic benchmarks and an exemplary application from the field of parallel robotics. An efficient self-optimization can be reached by combining a speed-up-based replacement strategy for scheduling the reconfigurable co-processors and a least mean square optimization algorithm without requiring any a-priori application profiling. For a reduced number of reconfigurable co-processors, the results show that the proposed software–hardware reconfiguration strategy provides, in general, better trade-offs between the required hardware resources and performance improvement when compared to the equivalent soft-core processor with the same number of static co-processors.

Introduction

Current embedded systems are used in a variety of applications, ranging from portable multimedia devices to sensor networks and robotic systems. The stringent computing performance and power consumption requirements in combination with the increasing demand for low cost and short time-to-market make designing these embedded systems challenging.

One possible approach to meet the aforementioned design goals is the use of application-specific processors (ASP), where a baseline processor is specialized for the efficient execution of a specific application or a set of applications. This specialization is performed by extending the baseline processor with new application-specific instructions or with dedicated hardware co-processors. It is worth mentioning that the efficiency of the resulting ASP in terms of performance and hardware resources should be analyzed carefully. This is especially true for those applications, where the processing characteristics significantly change during run-time depending on external factors, like the input data to be processed. For these applications, the use of dynamically reconfigurable application-specific processors (rASP) is mandatory, since the specialized hardware modules can be interchanged depending on the application’s current computational requirements [24]. In contrast to a static ASP, which includes all necessary dedicated hardware co-processors, a dynamically reconfigurable ASP architecture can take advantage of the ability to interchange dedicated hardware co-processors in order to reduce the power consumption due to the reduction of silicon area. Even though this is an important factor, this work is focused on higher performance for small FPGAs using a dynamic reconfigurable system. Thus, the cost-effectiveness is moved to the foreground.

rASPs can be implemented on modern configurable logic devices, like the Xilinx Virtex-6 FPGA family, which provides a feature to reconfigure sections of the FPGA, while keeping the other sections operational during run-time. In terms of Xilinx FPGAs, this feature is called dynamic partial reconfiguration (DPR) [33]. To make use of this feature, the design is split up into static and dynamic partitions (see Fig. 1). Besides components, like the soft-core processor itself, the static design part also implements logic to control the ICAP module, which actually performs the reconfiguration [33]. By using this hardware interface port, one can directly access the FPGA’s reconfigurable fabric. The dynamic part of the design can be divided into several reconfigurable partitions (slots), where different modules can be exchanged during run-time. These modules reside as partial bitstreams in external memory.

Choosing which reconfigurable modules should be reconfigured during run-time is a challenging task. An optimal interchange of these reconfigurable modules, which takes the time required for the reconfiguration into account, will increase the hardware efficiency of the whole architecture. Hence, such reconfigurable architectures with reduced hardware resources can reach processing performances (including the overhead of the reconfiguration process) similar to those of the static architectures, when implementing both processor architectures using the same target technology (e.g., FPGA or ASIC + eFPGA).

This paper extends a previously published hardware mechanism for performing the self-reconfiguration in soft-core processors [23]. In contrast to other strategies, where the reconfiguration process is activated by the application programmer or the compiler (e.g., explicitly using pragmas inside the target software application), the proposed hardware mechanism, inspired by the field of organic computing, is capable of replacing software-based arithmetic functions by dedicated hardware accelerators during run-time without direct intervention of the running application (and thus, entirely transparent for the software programmer). Organic computing [20] endeavors to map characteristics of living-beings to informational systems. Just like their natural counterparts, these organic systems possess so-called self-x properties to conduct the proposed adaption. Self-organization, self-protection, self-healing and self-optimization are just some of those characteristics. Derived from [26], a self-x organic computing system can be modeled by using an observer–controller concept (see Fig. 2). In the organic computing concept, the behavior of an abstract system is analyzed by an observer monitoring specific characteristics. These are conditioned and forwarded to the controller. Based on the reported data, the controller performs appropriate actions to modify the system to accomplish an overall, user-defined goal. In this work, the monitoring and controlling concepts are implemented as a part of the processor architecture.

The proposed self-adaptive hardware mechanism is implemented for evaluation purpose on a MIPS-based processor, resulting in a self-adaptive rMIPS soft-core processor.

This paper is organized as follows: Section 2 presents related works. Section 3 illustrates the proposed reconfiguration strategy and the hardware implementation of the proposed self-adaptive rMIPS soft-core processor. An evaluation of the proposed mechanism using two synthetic benchmarks and an exemplary application is given in Section 4. Section 5 concludes the paper.

Section snippets

Related work

Configurable processor architectures are an upcoming topic of research. An exemplary architecture is illustrated in Fig. 3 a. First, the critical compute-intensive sections are identified in the target application and the appropriate reconfigurable hardware accelerators are generated (Fig. 3 b). Then, these reconfigurable hardware accelerators will be used on-demand (i.e., swapped into a reconfigurable fabric and used to accelerate the computation), as shown in Fig. 3 c.

Different kinds of

Self-adaptive rMIPS processor (rMIPS)

A 32-bit MIPS-compatible CPU serves as starting point for the proposed self-adaptive rMIPS processor, shown in Fig. 4. The used CPU [9] implements an internal Harvard architecture (i.e., separated instruction and data access ports) [18] together with a classical 5-stages RISC pipeline (i.e., instruction fetch, instruction decode/operand fetch, execution, memory access, and write back). Moreover, the control logic includes a hazard unit to detect data and control hazards, which is also capable

Random and iterative synthetic benchmarks

In order to evaluate the proposed HW–SW substitution concept, two synthetic benchmarks, which execute a total of 100 runs each, are used for evaluation:

  • A random synthetic benchmark, which consists of an evenly-distributed invocation sequence of all implemented HW–SW substitutable functions.

  • An iterative synthetic benchmark, consisting of a large invocation sequence of all implemented HW–SW substitutable functions, which features recurring patterns separated by small random-distributed sections

Conclusions

The proposed self-adaptive rMIPS soft-core processor is capable of reducing the execution time of a target application during run-time by replacing – according to the execution’s computational demands – the attached HW-accelerators. By this, the overall goal of increased performance is accomplished. The number of available RCP slots and even more the applied reconfiguration replacement strategy have a major impact on the achievable gain on performance. Even system configurations with just a

Stephan Nolting received his Masters degree in computer sciences from Leibniz Universität Hannover, Germany, in 2015. Since then he is working as a research engineer at the Institute of Microelectronic Systems in Hannover, Germany. Currently, he is working towards a Ph.D. degree in the field of algorithms and architectures for digital image processing and video-based driver assistance systems.

References (34)

  • AzarianA. et al.

    Reconfigurable computing architecture survey and introduction

  • BauerL. et al.

    Run-time Adaptation for Reconfigurable Embedded Processors

    (2010)
  • BauerL. et al.

    RISPP: rotating instruction set processing platform

  • BaumgarteV. et al.

    PACT XPP - A self-reconfigurable data processing architecture

    J. Supercomput.

    (2003)
  • BeckerJ. et al.

    DReAM: A dynamically reconfigurable architecture for future mobile communication applications

  • CallahanT.J. et al.

    The Garp architecture and C compiler

    Computer

    (2000)
  • CongJ. et al.

    A fully pipelined and dynamically composable architecture of CGRA

  • GovindarajuV. et al.

    Dyser: Unifying functionality and parallelism specialization for energy-efficient computing

    IEEE Micro

    (2012)
  • A. Grant, opencores.org Project: 32-bit MIPS Processors, Part of the eXtensible Utah Multicore (XUM), University of...
  • HauckS. et al.

    The Chimaera reconfigurable functional unit

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2004)
  • JuniorJ.F. et al.

    Towards an adaptable multiple-ISA reconfigurable processor

  • KockM. et al.

    Hardware-accelerated design space exploration framework for communication systems

    Analog Integr. Circuits Signal Process.

    (2014)
  • KotlarskiJ. et al.

    Influence of kinematic redundancy on the singularity-free workspace of parallel kinematic machines

    Front. Mech. Eng

    (2012)
  • LodiA. et al.

    XiSystem: A XiRisc-based SoC with reconfigurable IO module

    IEEE J. Solid-State Circuits

    (2006)
  • LyseckyR. et al.

    Warp processors

    ACM Trans. Des. Autom. Electron. Syst

    (2004)
  • LyseckyR. et al.

    A configurable logic architecture for dynamic hardware/software partitioning

  • MeiB. et al.

    ADRES: An architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix

  • Stephan Nolting received his Masters degree in computer sciences from Leibniz Universität Hannover, Germany, in 2015. Since then he is working as a research engineer at the Institute of Microelectronic Systems in Hannover, Germany. Currently, he is working towards a Ph.D. degree in the field of algorithms and architectures for digital image processing and video-based driver assistance systems.

    Guillermo Payá-Vayá obtained his Ing. degree from the School of Telecommunications Engineering, Universidad Politécnica de Valencia, Spain, in 2001. During 2001–2004, he was a member of the research group of Digital System Design, Universidad Politécnica de Valencia, where he worked on VLSI dedicated architecture design of signal and image processing algorithms using pipelining, retiming, and parallel processing techniques. In 2004, he joined the Department of Architecture and Systems at the Institute of Microelectronic Systems, Leibniz Universität Hannover, Germany, and received a Ph.D. degree in 2011. He is currently Junior Professor at the Institute of Microelectronic Systems, Leibniz Universität Hannover, Germany. His research interests include embedded computer architecture design for signal and image processing systems.

    Florian Giesemann received his Masters degree in computer sciences from Leibniz Universität Hannover, Germany, in 2012. Since then he is working as a research engineer at the Institute of Microelectronic Systems in Hannover, Germany. Currently, he is working towards a Ph.D. degree in the field of algorithms and architectures for digital image processing and video-based driver assistance systems.

    Holger Blume received his diploma in electrical engineering in 1992 at the University of Dortmund, Germany. In 1997 he achieved his Ph.D. with distinction from the University of Dortmund, Germany. Until 2008 he worked as a senior engineer and as an academic senior councilor at the Chair of Electrical Engineering and Computer Systems (EECS) of the RWTH Aachen University. In 2008 he got his postdoctoral lecture qualification. Holger has been Professor for “Architectures und Systems” at the Leibniz Universität Hannover, Germany, since July 2008 and manages the Institute for Microelectronic Systems. His present research includes algorithms and heterogeneous architectures for digital signal processing, design space exploration for such architectures as well as research on the corresponding modeling techniques.

    Sebastian Niemann received his diploma in mathematics from the Leibniz Universität Hannover, Germany, in 2012. He is working as a research engineer at the Institute of Systems Engineering, Department of Systems and Computer Architecture, Hanover, Germany. Currently, he is working towards a Ph.D. degree in the field of continuous optimization, researching self-optimizing algorithm selection techniques, gradually reducing the selection problem impact over time.

    Christian Müller-Schloer studied EE at the Technical University of Munich and received the Diploma degree in 1975, the Ph.D. in 1977. In the same year he joined Siemens Corporate Technology in Munich and in Princeton, NJ, USA. In 1991 he was appointed full professor of computer architecture and operating systems at the Leibniz Universität Hannover. He engaged in systems level research such as system design and simulation, embedded systems, virtual prototyping, and in adaptive and self-organizing systems. He is one of the founders of the German Organic Computing Initiative and author of more than 200 papers and 14 books.

    View full text