Elsevier

Integration

Volume 63, September 2018, Pages 204-212
Integration

NoC-DPR: A new simulation tool exploiting the Dynamic Partial Reconfiguration (DPR) on Network-on-Chip (NoC) based FPGA

https://doi.org/10.1016/j.vlsi.2018.04.003Get rights and content

Highlights

  • A state-of-art NoC-DPR simulator is proposed.

  • Some recommendations are extracted for the implementation of DPR on NoC-based FPGA.

  • The optimal size of network is calculated based on reconfiguration time requirements.

  • The DPR of NoC-based FPGA is studied and evaluated using an embedded application.

  • The reconfiguration time is enhanced with factor 6.5x over the conventional SRAM-based FPGA.

Abstract

Due to the ability of Dynamic Partial Reconfiguration (DPR) of SRAM-based Field Programmable Gate Arrays (FPGAs) to add more flexibility over runtime phase, DPR is attracting more interest. Recently, FPGA manufacturers are facilitating the design of applications that utilize DPR. One of the main issues in our knowledge of DPR's current techniques (i.e., ICAP and JTAG) is a performance bottleneck; only one DPR is allowed at a time. In this paper, a state-of-art NoC-based FPGA simulator which supports DPR simulation is proposed. The proposed NoC-DPR simulator is used to investigate design limitations and performance degradation of using DPR on NoC-based FPGA. To estimate the reconfiguration time overhead, which results from increasing the number of simultaneous DPRs on FPGA fabric, some experimental investigations are carried out using NoC-DPR simulator. These investigations revealed that the overhead of reconfiguration time increases exponentially with increasing the number of simultaneous DPRs. However, further investigations show that the network of wormhole routers with virtual channels optimizes the reconfiguration time with a factor up to 4x than that of the network of wormhole routers without virtual channels.

Introduction

Many applications, mapped on SRAM-based FPGAs (Field Programmable Gate Arrays), such as signal processing, including image and video, software defined radio (SDR) [1], and electronic measurement applications are increasingly using Dynamic Partial Reconfiguration (DPR) feature. Moreover, partially reconfigurable (PR) devices save chip area by programming only the necessary physical resources in each operation phase. Accordingly, area and power are saved by programming only the desired block, which allows for static leakage reduction.

The prime factor to check the feasibility of using DPR techniques, such as ICAP and JTAG, is the available lead time, which is the latency between the configuration and the initiation of a PR, and denoted by: reconfiguration time (RT). Consequently, more researchers aim to optimize the RT of DPR that is related and limited to the frame size of SRAM-based FPGA layouts [1].

Due to the continuous scaling of CMOS devices, manufacturers are increasing the number of functions implemented on a single chip. Therefore, the concept of System-on-Chip has been introduced, which consists of processing elements (PEs) and storage elements (SEs) connected by a complex communication architecture.

Within the last few years, communication among these PEs is destined to become a vital factor in the design of large-scale systems. As the focus is to increase the number of PEs in parallel in order to maximize the capability of modern designs, thus the processing power has increased and data-intensive applications have emerged. Consequently, several challenges of the communication among these PEs, when configured on FPGAs, have become significant and require innovative solutions. Therefore, a prominent concept for communication known as Network-on-Chip (NoC) has been adapted for FPGAs to handle these PEs communication challenges.

To investigate this NoC concept, a state-of-art tool denoted by NoC-DPR is developed [2], which is a cycle-accurate simulator for NoCs that support DPR. This tool is used to simulate the performance of NoC-based FPGA. In NoC-DPR, a NoC simulator namely: NoCTweak [3] and a SystemC Library called ReChannel [4], which is a DPR simulation library, are integrated. All PEs of NoC are reconfigured dynamically to adopt a new application at run-time.

This paper is organized as follows. Section 2 provides an overview of previous related research efforts in DPR simulation and NoC simulation. In Section 3, the NoC-DPR simulator architecture is presented. Section 4 investigates the NoC-DPR performance compared to NoCtweak simulator. In Section 5, The DPR experiment is analyzed along with the results. Section 6 illustrates the case study of the embedded application using NoC-DPR simulator. Design insights and recommendations to implement DPR for NoC-based FPGA are stated in Section 7. Finally, Section 8 concludes the paper and presents the future work.

Section snippets

Related work

Since the first generation of Xilinx FPGAs that support DPR, Virtex-II at early 2007 [1], the design for DPR was a slight complex task due to the lack of supporting tools, and the requirement of full understanding of the FPGAs architecture. Therefore, FPGA designers use DPR simulators at early design stages as a proof of concept, and to reduce the time to market. Several approaches [[5], [6], [7]] have been proposed to model dynamically reconfigurable systems at system-level using SystemC,

NoC-DPR simulator architecture

NoC-DPR simulator is a command line based tool that consist of a 2-D mesh network of routers, simulated by NoCTweak [3], as illustrated in Fig. 2. Each node consists of a Processor Element (PE), Network Interface (NI), and an associated router. Each router connects with four nearest neighboring routers forming a 2-D mesh network. Using ReChannel [4] library, each PE is dynamically reconfigured by a special type of data packet, generated from certain node (master node 0, 0). Data packets are

Network interface impact

Network interface is composed of two decoupling buffers that are responsible for storing and synchronizing flits (a flit stands for FLow control unIT, which is the minimum unit of the message). Inserting an explicit NI between the PE and the router, affects on the network performance specifically on the latency and the throughput.

The latency after inserting NI is measured and compared to the latency of NoCTweak. Network of wormhole routers with buffer size 2-flits per input port running at

Results and discussion

The test experiment in this work aims to simulate the DPR on NoC-based FPGA platform using different network sizes, and different number of parallel DPR; thus the comparison is held with respect to the Reconfiguration Time (RT). Initially, Virtex-5 xc5vfx100t FPGA is used to select different partial reconfiguration regions (PR), then the bitstream sizes of each configuration region are calculated using Xilinx ISE v14.7 tool. Finally, RT is determined by using partial reconfiguration cost

Case study: NoC-DPR with embedded application

Many embedded applications such as 802.11a WiFi receiver [17], Video Object Plane Decoder, and multimedia system [18] are examined using NoC-DPR simulator to have an early access to the application performance at the design stage.

Each application is composed of different number of tasks with different FIR. All tasks are mapped onto the network using either random mapping or n-map mapping algorithm [19]. Furthermore, each task communicates with one or multiple destinations. The specifications of

Design recommendations

Some design insights and recommendations should be taken into account during the design of DPR on NoC-based FPGAs using the proposed NoC-DPR simulator:

  • A general NoC platform cannot be used to implement DPR application directly. For instance, when one process element (PE) is performing DPR, network should prevent other PEs sending or receiving data to/from this PE until DPR is finished.

  • In proposed NoC-DPR simulator, it is assumed that PE (0, 0) is the master of DPR process that is responsible

Conclusion and future work

In this work, a state-of-art NoC-DPR simulator is proposed, and some recommendations are extracted for the implementation of DPR on NoC-based FPGA to get the optimal size of network.

It is obvious that NoC-based FPGA enhances reconfiguration capabilities because multiple DPRs are performed simultaneously. However, supporting multiple DPRs needs to add more resources such as controlling unit and decoupling buffers. Accordingly, the reconfiguration time of DPR with NoC is better than

Acknowledgment

This research was funded by NTRA, ITIDA, Cairo University, Zewail City of Science and Technology.

References (19)

  • A. Hassan et al.

    Performance evaluation of dynamic partial reconfiguration techniques for software defined radio implementation on FPGA

  • A. Hassan et al.

    Exploiting the dynamic partial reconfiguration on noc-based FPGA

  • Anh T. Tran et al.

    NoCTweak: a Highly Parameterizable Simulator for Early Exploration of Performance and Energy of Networks On-chip

    (July 2012)
  • A. Raabe et al.

    ReChannel: describing and simulating reconfigurable hardware in SystemC

  • Adriatic Consortium

    Advanced Methodolgy for Designing Reconfigurable SoC and Application-targeted IP-entities in Wireless Communications Webpage

    (2002)
  • I. Benkhermi et al.

    System-Level modelling for reconfigurable SoCs

  • Alisson V. De Brito et al.

    An open-source tool for simulation of partially reconfigurable systems using SystemC

  • A. Schallenberg et al.

    Designing for dynamic partially reconfigurable FPGAs with SystemC and OSSS

  • N. Jiang

    BookSim Interconnection Network Simulator

There are more references available in the full text version of this article.

Cited by (3)

View full text