A fast MPSoC virtual prototyping for intensive signal processing applications

https://doi.org/10.1016/j.micpro.2011.06.001Get rights and content

Abstract

Due to the growing computation rates of intensive signal processing applications, using Multiprocessor System on Chip (MPSoC) becomes an incontrovertible solution to meet the functional requirements. Today, Electronic System Level (ESL) design is considered a vital premise to overcome the design complexity intrinsic in the heterogeneity of these devices. However, the development of tools at the system level is in the face of extremely challenging requirements such as the rapid system prototyping, the accurate performance estimation, and the reliable design space exploration (DSE).

Focusing on the issue of ESL development tools, this paper describes an MPSoC environment design which targets the Multidimensional Intensive Signal Processing (MISP) application domain. Within this environment, we have defined first a generic execution model that supports any type of MPSoC. It can adapt to any parallel application and handle efficiently the scheduling and synchronizations at all the levels of granularity. Second, a new Virtual Processor (VP) based simulation technique is proposed for implementing the execution model. This proposal leverages the high-level specification of the system to provide a heterogeneous MPSoCs simulation without using an Instruction Set Simulator (ISS). VP-based simulation is implemented in SystemC at a timed transactional level allowing a good trade-off between high simulation speed and performance estimation accuracy. The usefulness and the effectiveness of our MPSoC environment is illustrated through two MISP applications executed on a typical MPSoC. Results show that our approach enables fast MPSoC virtual prototyping, data transfers and timing analysis, and reliable DSE for architectural optimizations.

Highlights

► We define an appropriate MPSoC execution model for intensive signal processing applications. ► We implement the MPSoC execution model using a new Virtual Processor (VP)-based simulation technique. ► VP-based simulation is implemented in SystemC at a timed transactional level allowing a good trade-off between high simulation speed and performance estimation accuracy. ► Fast MPSoC virtual prototyping, data transfers and timing analysis, and reliable DSE for architectural optimizations are the main results of our approach.

Introduction

Nowadays, embedded signal processing applications such as the one used in networking, multimedia, and communications are becoming more and more complex. Two main aspects characterize these applications. The first involves the high complexity of the data structures, which generally represent multidimensional data arrays, while the second concerns the potential parallelism available in the application functionality. These two aspects lead to the Multidimensional Intensive Signal Processing (MISP) application domain. Hence, hardware designers are obliged to come up with new architecture definition for executing this field of applications. An inevitable solution to meet the performance goals consists in placing several processors in the same chip, thus creating MultiProcessor Systems-on-Chip (MPSoC). Today, MPSoCs are increasingly used to build complex integrated systems [1]. They must be designed with custom architectures to balance the implementation constraints between the application needs (i.e.: high computation rates and low power consumption) and the production cost.

MPSoCs have a huge architectural solution space which makes the Design Space Exploration (DSE) complex and most important challenge for designers. For instance, architectural parameters which can be explored include the processor type, the interconnection network topology, and the mapping of tasks (hardware or software) and data. In addition, a huge space of alternatives to implement and execute the systems is possible, which yields to a multitude of performance trade-offs in terms of execution time, power consumption, cost, etc. For MISP applications, this challenge is especially intensified by the high potential parallelism and the complex data distribution. The complexity lies mainly on the organization of the elementary tasks, which compose the application, and on the access patterns to their input and output data as parts of multidimensional arrays. These complex access patterns lead to difficulties to efficiently schedule the applications on MPSoCs. It is therefore important to define an appropriate MPSoC execution model that makes profit from MISP application characteristics and also allows early system exploration.

The challenge of MPSoC DSE is tackled by several frameworks by means of the development of Electronic System Level (ESL) tools. The objective is to unify the hardware and software design and to offer a rapid system level prototyping. Based on the requirements like the timing accuracy and the simulation speed of the system, architects could select an appropriate level of abstraction to model the software executing on the processor. Fig. 1 shows the former used models [2] and also our proposed approach called the Virtual Processor (VP)-based simulation. In the past decade, commercial tools have succeeded to provide conventional RTL simulation environments for low level system prototype. This approach was very useful on one hand to the software developers for driver debug and integration. On other hand, it allows the hardware engineers to keep their traditional view of the system. However, the RTL tools cannot adequately support the complexity of future MPSoC since they are too slow for a meaningful execution of the software. In an attempt to reduce simulation time, a lot of research efforts have been put to evaluate the system using Cycle-Accurate (CA) simulators. They simulate the micro-architecture at the clock-cycle level and are by far the most common type of simulator used. At a higher abstraction level, an Instruction Set Simulator (ISS) sequentially executes the instructions and has no notion of concurrency of micro-architecture. The ISS description can be enhanced with timing annotations to approximate the execution time and obtaining a behavioural model. However, simulation speed with CA or behavioural ISS simulators are limited to few hundred thousands of simulated cycles per second. In addition to this challenge, as the architecture part must be closely adjusted to the application needs, frontiers between different domain experts (hardware, software, compilation, etc.) have to be broken. During the development process, the hardware/software interaction must be kept and the transition between the different design steps must be as smooth as possible.

In order to answer the design challenges of MPSoCs dedicated to MISP applications, a new approach is needed. In this paper, two contributions in the field of MPSoC ESL design and simulation tools are made. First, an efficient MPSoC execution model adapted for MISP applications is defined. It respects a repetitive Model of Computation (MoC) [3], which offers a very suitable way to express and manage the potential parallelism in the system. Second, a VP-based simulation technique is presented. It speeds up the time of the system verification with a good performance estimation accuracy. VP technique is implemented at a high-level simulation using the SystemC Transaction Level Modeling (TLM) 2.0 kit. It leverages the high-level system specification to provide a hardware/software co-simulation without using an ISS.

This paper is organized as follows. After Section 2 which presents the related works, Section 3 exposes an overview of the repetitive MoC on which our approach relies. In Section 4, an explicit execution model is defined in order to obtain a precise implementation of the MPSoC. Section 5 presents our technique for a high-level simulation. To evaluate our approach, experimental results are presented in Section 6.

Section snippets

Related works

In an attempt to deal with the parallelism acquired from intensive signal processing applications, most approaches rely on a specific MoC. In general, such a MoC proposes a high-level formalism in order to exploit the parallelism at the task level in an easy and efficient way. Among widely used MoC, we can quote Khan Process Network (KPN) [4], Synchronous DataFlow (SDF) [5], multi-dimensional SDF (MDSDF) [6] and ArrayOL [3]. The main comparison criteria are the allowed data structures

Repetitive MoC overview

The design of MPSoCs in our work specifically relies on the repetitive MoC of ArrayOL [3], which offers a very suitable way to express and manage the potential parallelism in the system. This MoC is accessed in our design environment via the MARTE (Modeling and Analysis of Real-time Embedded systems) standard profile [7]. MARTE allows to model both software and hardware of a system using UML. The hardware/software mapping can also be represented using the same repetitive formalism. With the

Execution model

The MoC we have presented specifies an accurate semantic for the application. However, for a given application and architecture, the system can still be implemented and executed in many different ways: whichever execution technique is used, the same outputs will be generated, but with different non-functional properties such as the time to complete, the memory needed, or the power consumed. It is therefore important to define an efficient execution model. This model should be generic enough so

High level SystemC simulation

In this section, we propose an efficient simulation technique using the standard SystemC and its TLM 2.0 kit. The main objectives of this proposal are first to verify the functionality of MPSoC by the means of rapid system virtual prototyping. The second objective is to allow software engineers to perform a timing analysis and to monitor the traffic of patterns over the interconnect created by the execution of concurrent tasks. For the MISP application domain which is mainly data-flow oriented,

Experimental results and evaluation

This section illustrates the usage of our approach to perform DSE. For this purpose, two case studies are used: the Downscaler application, which has already been briefly presented in Section 3, and the H.263 encoder application, which is also a typical MISP application. The possible architecture to execute these applications has several parameters which can vary: the number of processors (4–16), the number of memory banks (1, 2 and 4), and the cache size (2–64 kB). As the applications are based

Conclusions

Targeting the MISP application domain, we have presented a new ESL simulation approach, adapted to the MPSoC design. Our approach speeds up the simulation by leveraging the high-level modeling provided by novel contributions such as MARTE. It introduces the notion of Virtual Processor which, in its essence, consists in replacing the processors by the software tasks which are mapped on it and adding a wrapper to translate external behaviour of the processor. Performance can be further optimized

Rabie Ben Atitallah is currently an Associate Professor in Computer Science at the University of Valenciennes and member of LAMIH laboratory within the DIM (Decision, Interaction, and Mobility) team. He is also an associated member of DaRT project at the INRIA Lille-Nord Europe research institute. He is an IEEE member and a member of High Performance and Embedded Architecture and Compilation (HiPEAC) European Network of Excellence. Previously, he received his PhD degree in Computer Science from

References (34)

  • W. Wolf et al.

    Multiprocessor System-on-Chip (MPSoC) technology

    IEEE Transactions on CAD

    (2008)
  • T. Meyerowitz, Transaction Level Modeling Definitions and Approximations, 2005....
  • P. Boulet, Formal semantics of Array-OL, a domain specific language for intensive multidimensional signal processing....
  • G. Kahn

    The semantics of a simple language for parallel programming

  • E.A. Lee et al.

    Synchronous data flow

    Proceedings of the IEEE

    (1987)
  • M.J. Chen, E.A. Lee, Design and implementation of a multidimensional synchronous dataflow environment, in: Proceedings...
  • Object Management Group, A UML profile for MARTE, 2007....
  • A. Amar, P. Boulet, J.-L. Dekeyser, F. Theeuwen, Distributed process networks using half FIFO queues in CORBA, in:...
  • P. Dumont, Spécification multidimensionnelle pour le traitement du signal systématique, Thèse de doctorat. PhD Thesis,...
  • T.M. Parks, Bounded scheduling of process networks, PhD Thesis, EECS Department, University of California, Berkeley,...
  • J.T. Buck, Scheduling dynamic dataflow graphs with bounded memory using the token flow model. Ph.D. thesis, University...
  • Open SystemC Initiative, SystemC, World Wide Web document, 2008. URL...
  • IEEE, System Verilog, 2005....
  • L. Cai, D. Gajski, Transaction level modeling: an overview, in: Hardware/Software Codesign and System Synthesis, 2003,...
  • The SoCLib project: an open modelling and simulation platform for system on chip design....
  • G. Beltrame et al.

    ReSP: a nonintrusive transaction-level reflective MPSoC simulation platform for design space exploration

    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

    (2009)
  • B. Bailey, System level virtual prototyping becomes a reality with OVP donation from imperas, Tech. rep., EDA, June...
  • Cited by (1)

    Rabie Ben Atitallah is currently an Associate Professor in Computer Science at the University of Valenciennes and member of LAMIH laboratory within the DIM (Decision, Interaction, and Mobility) team. He is also an associated member of DaRT project at the INRIA Lille-Nord Europe research institute. He is an IEEE member and a member of High Performance and Embedded Architecture and Compilation (HiPEAC) European Network of Excellence. Previously, he received his PhD degree in Computer Science from the University of Lille1 in March 2008. Between March 2008 and August 2009, he had a post-doctoral position at INRIA Lille-Nord Europe and the University of Valenciennes. His research interests include Embedded system design, MultiProcessor System-on-Chip (MPSoC), Low power-aware design, Virtual prototyping, Simulation, and Dynamic reconfigurable computing.

    Éric Piel born in France, is currently post-doc in Software Technology Department of the Delft University of Technology (The Netherlands) working in the Poseidon project, in partnership with Thales Nederland. The subject of the research is to ease and improve the integration of large-scale component-based systems. Previously, he received his PhD in 2007 at INRIA Lille (France) on the subject of embedded system specification and model transformations, and his engineer diploma in Computer Science at the University of Technology of Compiégne (France) in 2003. He has also worked in the industry in the R&D department of Bull on the subject of mixing Real-Time capabilities and parallel processing.

    Smail Niar is a Professor in computer science at the Institut des Sciences et des Techniques de Valenciennes (Valenciennes - France). His research activities are done at LAMIH – “Information, Decision Making & Embedded Systems” Research group. He is also a member of INRIA-Lille DaRT Project and member of High Performance and Embedded Architecture and Compilation (HiPEAC), the European Network of Excellence FP7-ICT programme.

    Philippe Marquet is currently an assistant professor at the University of Lille, France and a researcher within the INRIA, the French institute for research in computer science. Philippe MARQUET has received a PhD in Computer Science from the University of Lille in 1992. His research interests include the design of parallel, embedded and reconfigurable architectures, the definition of programming models, languages and compilers dedicated to parallel computing. He also worked on the definition and implementation of real-time operating systems for SMP architectures. Recently he has worked on the design of a massively parallel architecture on a chip. He (co-)advised 11 PhD thesis.

    Jean-Luc Dekeyser received his PhD degree in computer science from the University of Lille 1 in 1986; afterwards, he was a fellowship at CERN Geneva. After a few years at the Supercomputing Computation Research Institute in Florida State University, where he worked on high performance computing for Monté-Carlo methods in High Energy Physics, he joined the University of Lille 1 in France as an assistant professor, in 1988. There he worked on data parallel paradigm and vector processing. He created a research group working on High Performance Computing in the CNRS lab in Lille. He is currently Professor in computer science at University of Lille 1 and is also heading the DaRT INRIA project at the INRIA Lille Nord Europe research center. His research interests include embedded systems, System on Chip co-design, synthesis and simulation, performance evaluation, High Performance Computing and Model Driven Engineering.

    1

    Tel.: +31 15 278 6338.

    2

    Tel.: +33 (0)3 27 51 19 48; fax: +33 (0)3 27 51 19 40.

    3

    Tel.: +33 (0)3 59 57 78 05; fax: +33 (0)3 59 57 78 50.

    4

    Tel.: +33 (0)3 59 57 78 04; fax: +33 (0)3 59 57 78 50.

    View full text