The SegBus platform – architecture and communication mechanisms

doi:10.1016/j.sysarc.2006.07.002

Journal of Systems Architecture

Volume 53, Issue 4, April 2007, Pages 151-169

https://doi.org/10.1016/j.sysarc.2006.07.002 Get rights and content

Abstract

In this study, we introduce the SegBus architecture, a synchronous segmented bus platform for systems on chip. We present the envisioned structure in detail, and also address aspects of communication on the platform. The motivation behind SegBus is the search for performance improvements, in several directions, such as global throughput, power consumption, modularity, adaptability. By means of an example, we illustrate the capabilities of the described architecture. The implementation strategy targets FPGA technology, and allows for the utilization of multiple clock domains. The platform emerges as a highly design-time configurable system, adaptable to various design constraints.

Introduction

Currently, the semiconductor industry continues to develop and implement smaller technology nodes, creating the premises for increasingly more powerful applications to find support within the limits of single chip boundaries. At the same time, alternative architectures are brought to light, in order to support ever increasing requirements concerning design features like performance, power consumption, adaptability, reusability. In general, these (partially) novel architectures try to extract maximum of benefits from current technologies, with respect to the mentioned design characteristics, while also providing a smooth transition to future ones.

An answer to the above challenges is successfully given by the platform-based paradigm [18], [24]. Either in its “original” form, or in some derivatives such as communication-based [28] or interconnect-based [22], the platform-based design paradigm promotes an easier system construction. It opens large doors to competition in the area of component selection, by offering a serious motivation for intellectual property (IP) development. It also satisfies requirements of reusability, as single constraints are the compliance to interface specifications, and to specific communication protocols. The only issue that one may observe is the adaptability of a given platform to a specific application domain. This is usually solved by providing application specific platforms, where characteristics of an application (desirable of an application domain) such as required communication protocols, operating speeds, necessary processing elements, etc., are captured within the platform description, in a distinctive manner.

Out of several platform solutions, the traditional bus-based system is among the most frequent. The rich availability of standards, architectures, components and methods continues to sustain the bus-oriented design as the favorite (or even the default) option for system development, either for off-chip or on-chip solutions.

One outstanding obstacle prevents the seamless application of modern technologies to system development. As technological sizes continue to decrease, interconnect becomes one of the main design constraints, which dominates the power consumption and degrades the performance due to its poor scalability. In addition, the growing diversity of devices within the boundaries of a modern system-on-chip (SOC) places an increasing pressure on the design goals such as performance, power consumption, communication. Both the design process and system performance are limited by the complexity of the interconnection between the different modules and blocks that are integrated into these chips. Single clocked systems are not a solution, anymore. Different data transfer speeds are required, as well as parallel transmission. The traditional system bus may not be suitable for such a design. Since only one module can transmit at a time, the bus is slow due to large capacitive load [9] caused by the interfaces of the modules that are attached to it, and the large physical length. Additionally, the modern SOC designer assembles the system using ready-components – IPs, which might not be easily adaptable to different clocking situations.

A solution to some of the above mentioned problems is a segmented bus design combined with a globally asynchronous locally synchronous (GALS) [8] system architecture. In this approach, each distinct module of a SOC system works based on an optimized local clock, whereas interactions between those modules are asynchronous. Hence, the routing of the clock signal and the clock skew are no more system level design issues; they are limited to a local synchronous segment.

In large, this study describes the realization of a synchronous segmented bus system, namely the SegBus platform. Here, each segment can be identified with a different clock domain. Between segments, there are FIFO structures (border units) with additional control logic, which we call segment borders. We introduce the reader, in a stepwise manner, to the structural aspects of the SegBus platform and then describe in detail the used communication protocols and their implementation. The result is a parameterized platform, adaptable to multiple system requirements. The SegBus is based on a combination of synthesizable VHDL code and schematic descriptions. The underlying implementation technology is represented by the ALTERA FPGA device families.

Related work. The concept of segmenting the buses has been proposed in the past, mainly for multi-computer architectures [16], [19], [33]. The work collected by Kartashev and Kartashev [15], for instance, presents a close resemblance to our SegBus approach, at computer level. More recent studies place a segmented bus in the context of a single-chip device. The segmented bus platform that we analyze here was initially introduced by Seceleanu et al. [25], [21], as an asynchronous architecture. This choice relaxed several analysis assumptions, as the request-acknowledge handshake signals provided the self-timed synchronization required for data transfers. The synchronous counterpart is introduced by Seceleanu [26].

Ewering [12] brings into focus the idea of partitioning the on-chip bus, having as goal the spatial and temporal reduction of the datapath design. The partitioned PARBUS is a very simple architecture, resembling a dual rail pipelined scheme. Functional units are placed within one partition, between two busses, equally segmented. Symmetrically placed switches, controlled by a global signal, connect the bus segments.

An illustrative analysis focused on segmented bus design is presented by Jone et al. [14]. The system is implemented as an ASIC, with specific characteristics of both physical interconnect, and of the communication structure itself. Issues are analyzed starting at transistor levels, while the communication infrastructure allows tree-like constructs, differently from the partitioned bus approach taken in [12] (otherwise an ASIC implementation, too).

A synchronizing buffer between two mutually asynchronous clock domains is presented by Kessels et al. [17]. However, the presented structure is unidirectional. In order to be able to use it in our approach, where bidirectional data transfers are necessary, penalties in area would become too expensive (double size). Hence, one of the main elements on which we base our approach is the “glitch protection for unrelated clock sources” device (GPD), as described in [1]. It offers the possibility to choose between two clock signals, and it is an accepted element in the FPGA community.

Similar to our approach to implement module communication based on a “store-and-forward” scheme is the work presented by Srinivasan and Vijaykrishnan [30], on an AMBA-like segmented bus platform. Other solutions, such as Wang et al. [32], or Hsieh and Pedram [13], come close to the structures introduced in [12], [14], that is, separation of segments is implemented by means of (bidirectional) switches. This solution benefits mostly from the fact that the respective platforms handle single clock designs.

Overview of the paper. We start our study by introducing the architecture of the segmented bus, in Section 2. Then, we detail the communication mechanism throughout Section 3. The parameterized platform is described in Section 4, followed by an example, in Section 5. An analysis of the proposed platform and comparison with related work are carried out in Sections 6 Simulation results and platform analysis, 7 Related work and perspectives, respectively. We end with some concluding remarks and with a brief look into future work topics, in Section 8.

Section snippets

The segmented bus platform

A bus-based system is composed of three kinds of modules: masters, slaves, and arbiters. Masters are active parties of bus transactions, requesting services from slaves – the passive parties of transactions. Since a bus is a shared communication link, only one master at a time may access the bus, that is, transfer data to or from a slave module. Hence, there is a need for arbitration between masters. The arbitration scheme may be either distributed or centralized.

In a distributed organization,

Communication analysis

When designing the SegBus platform, unique identification numbers are assigned to each of the employed devices. Also, the segments are numbered in a continuous manner, from left to right, starting with segment 0. We base the development of our synchronous system platform on a “store-and-forward” communication policy. Each segment border is marked by the presence of a FIFO (Fig. 3), which temporarily stores data to be sent either to the bus segment placed at its right, or to the one placed at

Platform realization

The SegBus platform is a combination of synthesizable behavioral and structural VHDL code, plus schematic descriptions (as for the case of the “Clock Sync” block in BUs). A wide range of parameters allows the adaptation of the platform to multiple design decisions.

The parameters are organized on two hierarchical levels. Firstly, we have the global parameters, related to the bus level and then the segment level parameters, specifying aspects of internal structure. A list of platform parameters

Example

In order to asses the predicted performance improvements offered by the SegBus platform, we set-up a simulation environment for both a traditional, single bus system, as well as for our SegBus approach. In this process, we try to observe if the SegBus platform brings in any performance improvements over the single bus solution. The system is supposed to perform a certain number of transfers between the constituting modules, transfers summing up to define a “global task”.

Briefly, the system is

Simulation results and platform analysis

The whole system has been simulated at post-synthesis levels, in the Modelsim environment [5]. The SegBus implementation (Stratix [2] EP1S30F780C7) runs at: 91 MHz (segment 0), 98 MHz (segment 1), 89 MHz (segment 2) and the central arbitration unit operates at a 40 MHz clock frequency. We have assigned to the single bus clock the fastest of the above frequencies, 98 MHz.

The three-segment case simulation results can be visualized in Fig. 11 – corresponding to the communication matrix described in

Related work and perspectives

The recent years have brought a revival of the segmented (or partitioned, or split) bus topic, as several other approaches have been developed. The common motivation behind these studies is the potential improvement in performance of such architectures.

Initially presented by Seceleanu et al. [25], with an asynchronous architecture, the SegBus is described here based on a synchronous design model. While in the asynchronous approach the masters have a direct view to their communication partners,

Summary and future work

We have presented the architectural and communication characteristics of a synchronous on-chip segmented bus platform, namely the SegBus. Performance-wise, the platform is placed mid-way between the classical system bus and the network on chip approaches. It provides certain performance improvements in comparison with the first, and employs a much simpler communication structure than those thought for the second.

Future work. It would be interesting to investigate how the proposed communication

Acknowledgements

The author is grateful to Mircea Stan, Ville Leppänen, Olli Nevalainen and to the anonymous reviewers, for their suggestions that helped improving the quality of this article.

References (34)

Altera Corporation, Techniques to Make Clock Switching Glitch Free, White Paper, August...
Altera Corporation, Stratix Device Handbook,...
ARM Limited, AMBA Specification (Rev 2.0),...
American National Standards Institution, Small Computer System Interface (SCSI),...
ModelSim Simulator. Available from:...
W.J. Bainbridge, S.B. Furber, Asynchronous Macrocell interconnect using marble, in: Proceedings of the International...
D.M. Chapiro. Globally-asynchronous locally-synchronous systems, PhD thesis, Standford University,...
W.J. Dally et al.
Digital System Engineering
(1998)
W.J. Dally et al.
Route packets, not wires: On-chip interconnection networks
DAC
(2001)

W.J. Dally et al.

The torus routing chip

Journal of Distributed Computing

(1986)

C. Ewering

Automatic high level synthesis of partitioned busses

ICCCAD

(1990)

C.-T. Hsieh et al.

Architectural power optimization by bus splitting

W.-B. Jone

Design theory and implementation for low-power segmented bus systems

ACM Transactions on Design Automation of Electronic Systems

(2003)

S. Kartashev et al.

A multicomputer system with dynamic architecture

IEEE Transactions on Computers

(1979)

C. Katsinis, A segmented-shared-bus multicomputer architecture, in: Ninth International Conference on Parallel and...

J. Kessels et al.

Bridging clock domains by synchronizing the mice in the mousetrap

Cited by (0)

View full text