Elsevier

Parallel Computing

Volume 39, Issue 9, September 2013, Pages 424-441
Parallel Computing

A hardware/software platform for QoS bridging over multi-chip NoC-based systems

https://doi.org/10.1016/j.parco.2013.04.011Get rights and content

Highlights

  • We identify bridging requirements over NoC-based systems.

  • We explore NoC protocol stack layers on every link of the on-chip interconnect.

  • We propose a bridging scheme at transport layer of the NoC’s protocol stack.

  • We present a generic, efficient architecture for FPGA prototype of the bridge.

  • We propose a new software API for on/off-chip hosts to configure the bridged NoCs.

Abstract

Recent embedded systems integrate a growing number of intellectual property cores into increasingly large designs. Implementation, prototyping, and verification of such large systems has become very challenging. One of the reasons is that chips/FPGAs resources are limited and therefore it is not always possible to implement the whole design in the traditional system-on-a-chip solutions. The state-of-the-art is to partition such systems into smaller sub-systems to implement each on a separate chip. Consequently, it requires interconnecting separate chips/FPGAs. Since Networks-on-Chip (NoCs) have become common interconnection solutions in embedded designs, we propose to bridge NoC-based SoCs enabling a generic multi-chip systems interconnection. In this context, the contribution of this paper is threefold, (i) we explore the NoC protocol stack to determine the best layer for implementing the off-chip bridge, (ii) we propose a generic hardware architecture for the bridge, and (iii) we develop a new software architecture enabling seamless configuration and communication of multi-chip NoC-based SoCs. Finally, we demonstrate performance, i.e., bandwidth and latency, of the bridge in a multi-FPGA platform, while the bridge guarantees QoS of traffic. The synthesis results indicate the implementation area cost of the bridge is only 1% of Xilinx Virtex6 FPGA.

Introduction

The demand of executing more and more applications on electronic consumer devices has led to an increase in the size and the complexity of recent embedded systems. On one hand, modular design approaches have enabled integration of numerous processor cores, memories and peripheral IPs into an embedded system; on the other hand, implementation, prototyping and verification of such large designs on chips/FPGAs have become more complex and challenging as follows.

Implementation: although Moore’s law indicates that the number of transistors that can be placed on a chip doubles approximately every two years, the recent designs can be so large that a single chip’s resources are still limited. Recently, there have been some efforts to bring 3D IC stacking on a single chip into practice [1], however, traditionally, system designers partition such large systems into smaller sub-systems to implement them as several Systems-on-Chip (SoCs), or accompany SoCs with a number of companion chips [2]. In such systems, denoted as multi-chip systems, each SoC has to communicate with other SoCs and therefore a multi-chip interconnection scheme is essential.

Prototyping: FPGA prototyping has become a common approach for early software development and hardware design verification [3]. The limited resources of an FPGA, however, are not sufficient for prototyping the current large SoCs. A system is therefore required to be partitioned into number of sub-systems, each of them is implemented on a single FPGA chip [4], and therefore a multi-FPGA system is formed [5]. Potentially, the FPGA chips are located on different circuit boards which are physically decoupled. Such systems require multi-chip interconnection mechanisms that also support board-to-board multi-FPGA communications.

Verification: both the implementations and prototypes of the current complex SoCs may still contain errors. Debug and verification processes are essential to find the erroneous parts of the systems and verify the systems’ correct functionality. Therefore, an external, fast, non-intrusive access to on-chip systems is required for two purposes: (i) to retrieve trace of on-chip data off-chip, and (ii) to enable an external host to perform debug and verification actions [6].

The common problem in the aforementioned trends is a need for an off-chip communication scheme for interconnecting individual SoCs. For this purpose, several techniques have been proposed in the literature, such as the work in [7], [8], [9], however to the best of our knowledge none of them provides a generic solution to answer the need of inter-chip, inter-FPGA, and chip/FPGA-host communications. In this work, we propose a generic, efficient hardware and software architecture for off-chip interconnection of individual SoCs. The SoCs internally have their own communication mechanisms to interconnect the cores and IPs. The off-chip interconnection mechanism is efficient when it is compatible with and seamlessly integrates in the on-chip interconnections. Since recently Network-on-Chip (NoC) [10] has become a common on-chip interconnection technology, our proposed off-chip interconnection scheme is adapted to be compatible with NoCs’ properties.

Various NoC architectures exist in the literature, e.g., Æthereal [11], Nostrum [12], Mango [13], QNoC [14], that may offer one or more quality of service classes. Many applications, e.g., signal processing and video streaming, have timing and throughput demands each require a specific QoS such as Guaranteed Throughput (GT) or Best Effort (BE). Consequently, the off-chip communication scheme should be also able to offer different QoS classes for the traffic of interconnected sub-systems to be compatible with the target on-chip interconnects.

In this paper, we propose a generic, efficient technique for interconnecting a NoC-based system with other (NoC-based) SoCs or any other external IP. In the rest of this paper, we refer to this scheme as bridging, since it makes a bridge for traffic from/to a chip to/from another chip/IP.

To make the bridge fully compatible with the on-chip interconnects we establish four design requirements of the bridge as follows; (i) the bridge should seamlessly extend the NoC such that memory-mapped accesses remain unchanged from applications point of view. In other words, in the global memory space of the system the bridge is transparent such that the memory-consistency model of the system is preserved, (ii) multi-chip bridging at the circuit board level should be supported, and the bridge should decouple temporally and physically the systems implemented on different circuit boards, (iii) the quality of service offered by the overall interconnect (i.e., sub-NoCs + bridge (s)) to the applications should be preserved, and (iv) the bridge should be implemented efficiently with high performance in terms of bandwidth and latency, and with low area cost. In the context of a bridging scheme that fulfills all these requirements, the contribution of our paper is threefold, as follows.

First, we investigate a NoC-based system to identify the possible bridge insertion points (i.e., links). At each link, we study different layers of the on-chip interconnect protocol stack [15] for possible bridging schemes. Our protocol stack model is based on the proposal in [16], where the model consists of five layers referred to as session, transport, network, data link, and physical layer. Our design space investigation is driven by the NoC properties that have impacts on the bridge’s requirements. We refer to these properties as design options, and we identify them as following: (i) parallel/serial link, (ii) flow control, (iii) buffering, (iv) routing, (v) synchronicity, and (vi) QoS. The investigation results in a novel proposal for a bridge design at the transport layer of the NoC.

Second, we propose a hardware architecture for the bridging scheme. The architecture is generic in the sense that it supports all inter-chip, inter-FPGA, and chip/FPGA-IP systems. We implement an HDL version of the bridge kernel that supports generic number of NoC connections each of which may be either best-effort or guaranteed-throughput.

The third contribution of this work is a software architecture to configure and to enable transparent global memory space communications over the bridged sub-systems. This architecture extends the run-time NoC configuration technique proposed in [17] and enables the bridge to integrate with and play as an on-chip host for NoC-based SoCs.

For our experiments, we set up a multi-FPGA system in which two instances of the bridge is utilized; one to connect two sub-NoCs each implemented on a separate FPGA, and another bridge to connect to an external host, which is a PC in our case. The experimental results show that the bridge achieves high-performance in terms of bandwidth and latency, and justify that it is able to provide required QoSs to data traffic.

The rest of this paper is organized as follows. In Section 2 we review the related work. Section 3 gives an overview of our target on-chip interconnection network. In Section 4, the bridge design requirements and the design options are explained in details. Based on them, in Section 5, we investigate the NoC’s links and the protocol stack to propose the best bridging scheme. The bridge hardware architecture is proposed in Section 6 followed by the proposal for the software architecture in Section 7. The experimental results of stand-alone bridge and of the case where the bridge is used in a multi-FPGA set-up are presented in Section 8. Finally, Section 9 concludes the contributions of this paper.

Section snippets

Related work

There are three main research topics that are directly related to our work. This section discusses them in turn, as follows. First, the existing techniques in on-chip and off-chip signaling is introduced to compare the related work that targets NoC-based systems with our proposed technique. Second, the related work in on-chip interconnect protocol stack is discussed; and third, we review interconnect configuration techniques.

Traditionally, a large system is built and prototyped by using

On-chip interconnect overview

The on-chip interconnection network has a key role in the bridging scheme. In this section we give an overview of the interconnection network. A connection between two Intellectual Property (IP) components is set up via the interconnect. The IPs are illustrated as a master and a slave in Fig. 1. The on-chip interconnect consists of traditional bus technology and a NoC architecture. The master starts a request by sending a write or read command to the bus (point 1). The bus is responsible for

Bridging requirements and design options

The requirements for a multi-chip bridge have direct impact on the bridging scheme. Therefore, in this section we first establish the design requirements as follows.

  • (1)

    Transparency: ideally, from an application point of view the bridge should be invisible, i.e., the global memory consistency model should be maintained. This is especially essential in the case of FPGA emulation of a partitioned system, where the emulated system should be functionally as close as possible to the real prototype. In

Protocol stack exploration

A connection between a master and a slave is formed by a set of physical communication links through the network, e.g., routerrouter and router-NI link. The links are illustrated as numbered points in Fig. 1. Possibly the bridge could be instantiated at each of these links. In this way, the bridge either partitions a NoC into two sub-networks, or connects two different NoC-based systems at these links.

Moreover, there is more than one layer of the interconnect protocol stack which is involved

Bridge hardware architecture

In this section we present the detailed architecture of the bridge to implement Scheme V illustrated in Fig. 2. The bridge architecture diagram is depicted in Fig. 3(a). The bridge Kernel consists of five main units that are responsible for, (i) providing off-chip board-to-board interface, (ii) forming off-chip communication data, (iii) interfacing with multiple connections of the target NoC, (iv) arbitrating between the connections, and (v) controlling the flow of on-chip and off-chip data.

Software architecture

Typically, NoC-based SoCs require a run-time configuration scheme [17]. A processor core which is usually on the same chip as the NoC is locally responsible to perform the configuration. This processor is called host. In the presence of an off-chip connection, the host may be an external IP such as a Personal Computer (PC) or a local host of another SoC on a separate chip.

In this section we first briefly present a basic interconnect configuration scheme, which is proposed in [17], and we show

Discussion and experimental results

In this section we first exercise the standalone bridge to evaluate its area cost when implemented on an FPGA, and to assess its performance under variable traffic loads. Second, we use the bridge in a NoC-based system to connect two sub-systems implemented on two separate FPGA chips. Using this setup, we show system level performance results of the bridge. The results is obtained from experimenting the system for interconnect configuration with various applications’ data transfer. The results

Conclusions

In this paper we proposed a generic, efficient off-chip bridging scheme for SoCs that are implemented on separate silicon or FPGA chips. The scheme is compatible with NoCs technology, which is commonly utilized in recent embedded systems. We have investigated the protocol stack of an on-chip interconnect to determine the best layer of implementing the bridge on possible links of the interconnect. The proposal is a scheme at the transport layer of the stack. At this layer, the bridge fulfills

References (54)

  • E. Bolotin et al.

    QNoC: QoS architecture and design process for network on chip

    Journal of Systems Architecture

    (2004)
  • C. Liu et al.

    Bridging the processor-memory performance gap with 3D IC technology

    IEEE Design and Test of Computers

    (2005)
  • F. Steenhof, H. Duque, B. Nilsson, K. Goossens, R.P. Llopis, Networks on chips for high-end consumer-electronics TV...
  • A. Kulmala et al.

    Evaluating large system-on-chip on multi-FPGA platform

    Embedded Computer Systems: Architectures, Modeling, and Simulation

    (2007)
  • A.-M. Kouadri-Mostefaoui, B. Senouci, F. Petrot, Scalable multi-FPGA platform for networks-on-chip emulation, in: Proc....
  • S. Hauk, Multi-FPGA Systems, Ph.D. Thesis, University of Washington,...
  • K. Goossens, B. Vermeulen, A.B. Nejad, A high-level debug environment for communication-centric debug, in: Proc. DATE,...
  • A. Shacham, K. Bergman, L. Carloni, On the design of a photonic network-on-chip, in: Proc. NOCS, 2007, pp....
  • M. Stepniewska, A. Luczak, J. Siast, Network-on-Multi-Chip (NoMC) for Multi-FPGA Multimedia Systems, in: Proc. Conf. on...
  • S. Furber, S. Temple, A. Brown, On-chip and inter-chip networks for modeling large-scale neural systems, in: Proc....
  • W.J. Dally, B. Towles, Route packets, not wires: on-chip interconnection networks, in: Proc. DAC, 2001, pp....
  • K. Goossens, A. Hansson, The Aethereal network on chip after ten years: goals, evolution, lessons, and future, in:...
  • M. Millberg, E. Nilsson, R. Thid, A. Jantsch, Guaranteed bandwidth using looped containers in temporally disjoint...
  • T. Bjerregaard, The MANGO clockless network-on-chip: concepts and implementation, IMM, Danmarks Tekniske Universitet,...
  • J.D. Day, H. Zimmerman, The OSI reference model, in: Proceedings of the IEEE, vol. 71, 1983, pp....
  • A. Hansson, K. Goossens, An on-chip interconnect and protocol stack for multiple communication paradigms and...
  • A. Hansson, K. Goossens, Trade-offs in the configuration of a network on chip for multiple use-cases, in: Proc. NOCS,...
  • M. Lee, C. Chen, Multi-chip module, 2002. US Patent...
  • J. Darnauer, P. Garay, T. Isshiki, J. Ramirez, W. Wei-Ming Dai, A field programmable multi-chip module (FPMCM), in:...
  • J. Darnauer et al.

    A silicon-on-silicon field programmable multichip module (FPMCM) integrating FPGA and MCM technologies

    IEEE Transactions on CPMT

    (1995)
  • K. Takahashi, M. Sekiguchi, Through silicon via and 3-D wafer/chip stacking technology, in: Proc. Symp. on VLSI...
  • E. Beigne, P. Vivet, Design of on-chip and off-chip interfaces for a GALS NoC architecture, in: Proc. ASYNC, 2006, pp....
  • S. Evain et al.

    NoC design flow for TDMA and QoS management in a GALS context

    EURASIP Journal on Embedded Systems

    (2006)
  • P. Del Valle, D. Atienza, I. Magan, J. Flores, E. Perez, J. Mendias, L. Benini, G. De Micheli, A complete...
  • C. Chang et al.

    BEE2: a high-end reconfigurable computing system

    IEEE Design and Test of Computers

    (2005)
  • P. Liu, C. Xiang, X. Wang, B. Xia, Y. Liu, W. Wang, Q. Yao, A NoC emulation/verification framework, in: Proc. ITNG,...
  • X. Li, O. Hammami, Multi-FPGA emulation of a 48-cores multiprocessor with NOC, in: Proc. IDT, 2008, pp....
  • Cited by (7)

    View all citing articles on Scopus
    View full text