A design space exploration methodology for customizing on-chip communication architectures: Towards fractal NoCs

doi:10.1016/j.vlsi.2014.11.007

Integration

Volume 50, June 2015, Pages 158-172

https://doi.org/10.1016/j.vlsi.2014.11.007 Get rights and content

Abstract

Recent studies have shown that On-Chip Interconnects (OCI) architecture represents one of the most important component that determines the overall performance of future System-on-Chip (SoC). In order to improve the performance of a specific SoC application domain, the OCI architecture must be optimized at design/run time. Different OCI-based architectures have been recently proposed, the most recent ones are fractal-based or self-similar topologies. In this paper, we present a customization approach by adding strategic links targeted to match large application workload. Simulations results show the effectiveness of this method to achieve better performance compared to the basic OCI architectures. Furthermore, fractal based OCIs perform well almost in all traffic patterns because of their attractive properties.

Introduction

System-on-Chip is a paradigm used in embedded systems especially for telecommunication and multimedia applications. The expansion of integration technology with the scaling down of transistor size provides today opportunities for integrating numerous components (e.g., Processing Elements) connected by an on-chip communication architecture. Therefore, the large number of components integrated into an on-chip system and the communications among them become a challenging issue that needs to be tackled. In other words, while microchip technology miniaturizes and gates become faster and more energy efficient, wires used for the communication between cores are performing slowly and remain power-hungry [35]. Therefore, the OCI infrastructure represents one of most important components in determining the overall performance (e.g. latency and throughput), reliability, and cost (e.g. energy consumption and area overhead) of future SoCs. However, the increasing complexity of OCI infrastructures makes their design extremely challenging. Therefore, the design of flexible, scalable and reliable on-chip communication architectures that meet the constraints and requirements of today׳s SoCs applications is required. Real-time applications are subjected to generate both streaming or guaranteed throughput “GT” and best-effort “BE” traffics; there is then a need for the on-chip interconnection system to efficiently handle these traffics.

Recently, Network-on-Chip (NoC) has emerged as a solution of non-scalable shared bus schemes currently used in SoC implementation [1], [6], [11], [18], [43]. Topology is a very important feature in the design of NoC because the router design depends on its characteristics (e.g., diameter, node degree). Different topologies are proposed in the literature that could be divided into three main categories: bus-based OCIs, circuit switched OCIs, and packet switched OCIs (see Fig. 1). The GT traffic requires QoS and is well handled in circuit switched network and in bus [41]. BE traffic is well handled in packet switched network [38]. Networks scale better and promise higher communication bandwidth than buses. Like buses, they allow the re-use of standard interface modules for connecting circuit nodes to the network. Network architectures can be divided into two categories, packet-switched and circuit-switched.

Inter-core communication of majority of SoCs is designed with bus based communication infrastructure. In this type of communication, all cores share one or more busses. Cores are connected to the bus through an interface [28] [36], [37]. A bus arbiter manages the communication and contention among cores. In bus-based design, cores require less number of I/O pins compared to direct interconnections. Similarly, cost and area of wiring required for communication is also reduced. In literature, there are many proposals for efficient use of buses such as hierarchical, segmented, pipelined buses etc. Despite these efficient proposals and above-mentioned advantages, shared buses do not scale beyond a certain limit depending upon the number of cores. Moreover, contention for the bus and arbitration also slows down data transfer between cores. Scheduling communication streams over non-time multiplexed channels is easier, because by definition a stream will not have collisions with other communication streams. The Æthereal [5] and SoCBUS [10] routers have large interaction between data streams (both have to guarantee contention free paths). Determining the static time slots table requires considerable effort. Because data-streams are physically separated, collisions in the crossbar do not occur. Therefore, we do not need buffering and arbitration in the individual router. An established physical channel can always be used.

In circuit switching, a dedicated connection path (a virtual circuit) between two processing tiles is created before starting communication. Circuit switching is most adapted than packet switching for GT traffic, because a large amount of the traffic between tiles will need a guaranteed throughput, which can be easily guaranteed in a circuit-switched connection. Current SoC have a large amount of wiring resources that give enough flexibility for streams with different bandwidth demands. The flexibility of packet switching is not needed, because data streams are fixed for a relatively long time. Therefore, a connection between two tiles is required for a long period. Once the virtual circuit is established, raw data can be freely transferred with very low overhead between the modules until the virtual circuit is no longer needed, at which time it can be closed. Circuit-switched networks require no overhead for packetisation, packet header processing or packet buffering. Once the virtual circuit is established, accessing data across a circuit-switched connection is no more difficult than accessing a synchronous memory (the requester sends an address and receives the corresponding data in return after a delay of a few clock cycles). As a result, the circuitry required for a circuit-switched network is relatively simple and appropriate for use in even small systems. Circuit switching eases the implementation of asynchronous communication techniques, because data and control can be separated. A control free pipelined asynchronous data stream does not require much design effort. The flexibility of the circuit switched approach makes it suitable in a variety of regular or irregular topologies.

Because of a number of disadvantages of direct interconnections and shared buses including low scalability and non-adaptability to new applications, packet switched networks are introduced for designing communication among cores in SoCs [1], [6], [28], [29], [30], [31], [33]. A packet switched network consists of a network of switches also called routers. In a packet-switched approach, the data are broken into packets. These packets are injected into the network where they are independently routed to the desired destination. Resources in the network are connected to the routers through resource network interface. Packet-switched networks often allow for high aggregate system bandwidth, as many packets can be in flight at a given instant. However, they generally require congestion control and packet processing, which includes buffers to queue-up packets awaiting the availability of the routing resources. This type of on-chip communication infrastructure is highly scalable. It also provides high possibility of reuse, thus reducing time to market and system cost. Different NoC based architectures using packet-switching have been studied and adapted for SoCs. Examples of these architectures are Fat-Tree (FT), Butterfly-Fat Tree (BFT), Ring, Spidergon, 2D Mesh, Torus, Octagon, FracNoC, and WK [14], [18], [33]. Some designers have also proposed topologies specific for an application or an application area [24]. Packet switched OCIs can also be divided into sub-categories that are Start-Ring Based, Mesh Based, Tree-like Based, and Fractal Based.

Fractal based OCI topologies are the most recent ones. Fractal architectures are receiving considerable attention in networking community [9], [27], [32], [33], [34]. A fractal topology is defined as a geometric structure that demonstrates similarity in properties at various scales, i.e., the structure look similar under different magnification levels [32]. In [33], a self-similar fractal-geometry-based triangle topology, called FracNoC is proposed for SoCs. In [34], another fractal topology that is the WK-recursive network is presented. Fractal based topologies have attractive properties, such as high degree of regularity, efficient communication performance for low energy consumption, and the ease of extendibility that suit NoC systems.

More precisely, a fractal structure reproduces itself iteratively, exhibiting invariant structural properties. In other words, a fractal describes a self-organizing mechanism. For example, a $W K$ -recursive network with amplitude $W$ and level $L$ , denoted by $W K (W, L)$ , can be recursively constructed. A $W K (W, 0)$ is a vertex with $W$ free edges. A $W K (W, 1)$ is a $W$ -vertex complete graph that is denoted by $K_{w}$ . Each vertex has one free edge and $W - 1$ edges that are used for connecting to the other vertices. A $W K (W, H)$ consists of $W$ copies of $W K (W, H - 1)$ as supervertices and the $W$ supervertices are connected as a $K_{w}$ , where $2 \leq H \leq L$ . By induction, $W K (W, L)$ has $W^{L}$ vertices and $W$ free edges. Consequently, for any specified number of degree $W$ , $W K$ -recursive networks can be expanded to an arbitrary level $L$ without reconfiguring the edges. The different topologies of $W K (4, 0)$ , $W K (4, 1)$ , and $W K (4, 2)$ are depicted in Fig. 2.

A FracNoC network topology is a fractal, denoted as FracNoC $(k)$ , and it can be described by an expansion level k [33] (see Fig. 3). A FracNoC network is described as follows:

For k=0 there are $N_{0} = 1$ nodes, with a maximum diameter $D_{0}$ =0, and it holds $P_{0}$ = $0$ links.

For k=1 there are $N_{1} = 4$ nodes, with a maximum diameter $D_{1}$ =2, and it holds $P_{1}$ = $5$ links.

For each k>1 there are $N_{k} = 4 N_{k - 1}$ nodes, with a maximum diameter $D_{k}$ =2 $(D_{k - 1} + 1)$ , and it holds $P_{k}$ = $4 P_{k} + 13$ links.

Where $N_{k}$ is the total number of nodes, $P_{k}$ is the total number of links, and $D_{k}$ is the maximum diameter of FracNoC $(k)$ . This family of topologies starts from a FracNoC $(0)$ and recursively expends to any level k, as illustrated in Fig. 3. It is worth noting that both WK and FracNoC OCIs have several attractive properties, such as self-similarity, reiteration, expandability and regularity that are more suitable for NoC design. More precisely, the geometrical structure of WK and FracNoC are factorisable. This is shown for instance in Fig. 2, Fig. 3, an interpretation of this phenomenon is that factorization represents a change of scale: prime unite into a coherent and self-similar structure. The higher we go in the scale the more complex is the structure. In other words, a fractal structure is based on the idea of a pattern repeating itself and can be described by a recursive definition, which is the case of FracNoC, it is not the case of 2D Mesh and X-Mesh since such OCI can grows by adding individual lines or columns. A structure may be self-similar but not fractal and could be fully defined without a need for recursion (e.g., a straight line). Thus, a 2D Mesh or X-Mesh could be considered fractal if we construct them as a self-similar iterated structures (i.e., with a recursion), which do not correspond to their formal definition, unless we consider them as a sub-case of the general class of mesh structures. This is what justifies the proposed classification illustrated in Fig. 1.

While these on-chip interconnect architectures draw on concepts inherited from parallel and distributed systems to interconnect IP cores in a structured and scalable way, they may suffer from higher latency, lower throughput, and high power consumption when adapted for large NoCs [11], [13], [14], [18], [42].

Many studies have shown that, to improve the performance of specific application domain, OCI architectures have to be customized at design-time by, for example, inserting additional long-range links between switches or allocating only required buffer size [1], [13], [23]. These approaches are generally tailored a specific application by providing an application specific SoC [24]. They deal with the selection of an OCI architecture to accommodate the expected application-specific data traffic pattern during early design-space exploration phase. Furthermore, for dynamic SoCs, in which traffic pattern is not known or predictable in advance, an augmented OCI is required to handle applications with unpredictable workload and subsequently unpredictable changes in active tasks and communication requirements.

In this paper, we present an approach to allow designers to customize a candidate on-chip interconnect architecture in order to match large application-specific workload. The objective is to explore and evaluate the performance of candidate OCI architecture, and therefore customize it to best suit the needs of several applications workload. A major challenge is the control parameters for obtaining the appropriate NoC needs of many applications. Due to the size of the design space (topology, paths, routing, etc.), it is not possible to explore all options for the different tradeoffs. It is therefore necessary to have methodologies and tools to customize the architecture to design the NoC tailored to the needs of up to different types of traffic applications. To customize the NoC topology without resorting to the traffic model, we wanted to improve the physical characteristics of the NoC topology, which will necessarily have a direct impact on communication performance.

Our on-chip interconnect network customization algorithm improves the physical properties of the topology by adding strategic links. However, it is difficult to choose wisely the links to place on the NoC without knowing in advance the traffic patterns of communication. Among the physical characteristics of the NoC topology that seems most significant, we chose to work on the average distance (denoted Dm) and the degree of clustering (denoted Cn). Specifically, our strategy is to maximize the degree of clustering (Cn) and minimize the average distance (Dm). So the goal is to find a compromise to improve both measures in a given initial NoC. Furthermore, the results also show that Fractal OCIs (e.g., Mesh, X-Mesh, WK, FracNoC) outperform non-fractal ones.

The remainder of this paper is organized as follows. In section 2, related work is presented. Section 3 describes the approach for customizing on-chip interconnect architectures. A case study, using six OCI architectures, is presented in Section 4 together with the performance evaluation results to show the effectiveness of the proposed approach. Section 5 presents a summary and discussion of the OCI customization approach. Conclusions and future work are given in Section 6.

Section snippets

Related work

On-chip interconnect architectures adopted for SoCs consist of a number of interconnected IP cores (e.g., CPU, DSP, memories) that communicate via an OCI. They are characterized by different trade-offs with regard to latency, throughput, communication load, energy consumption, and silicon area requirements. Several approaches have been proposed to deal with NoCs design and can be classified into two main categories, design-time approaches and run-time approaches. Design-time approaches are

OCI criteria

Several OCIs (e.g., 2D mesh, Spidergon) based architectures have been studied and adapted recently for SoCs. These OCIs have different features based on the following criteria [3], [13], [14], [18]

•
The diameter: the largest number of hops among all shortest paths. This criterion can be used as an indicator for evaluating the maximum latency. A small diameter allows fast communication between farthest nodes. In other words, the maximum delay is proportional to the maximum number of hops, and then

OCI customization approach

In this section, we present a design space exploration methodology for customizing or tuning a candidate on-chip interconnect architecture given resources budget, but independent of a particular application traffic pattern. This approach allows, therefore, the customization of the OCI architecture according to the physical properties of the topology, which may support maximum types of applications SoCs, i.e., it is adapted for a general purpose.

It should be noted that the customization approach

Evaluation study

We have conducted simulations using Nirgam to show how maximizing the Clustering-Degree and minimizing the Average-Distance influence the performance of SoC design. Nirgam is a system C cycle accurate simulator for evaluating NoCs. It provides substantial support to experiment with NoC design in terms of routing algorithms, applications, switching technique, virtual channels, traffic patterns, and buffer size, on various OCI topologies [22].

Nirgam simulator, however, in its original version

Conclusions and future work

In this paper, we introduced an approach for customizing on-chip interconnect for SoC applications. The objective of this work is to build a framework combining a customizing approach and the simulation evaluation tool, NIRGAM, for OCI customization at the design time without considering application traffic patterns. Simulation results show that customizing OCIs, based on available resources, achieves better performance compared to the basic OCI architectures.

According to this evaluation study,

Abderrahim Chariete received his Ph.D. (2014) degree in computer science and computer engineering from the University of Technology of Belfort-Montbéliard (UTBM) and the master research (2010) degree in computer science (communicating embedded systems) from the National School of Engineering of South Alsace (ENSISA) of the University of High Alsace (UHA), Mulhouse – France. His research interests include communicating embedded systems, distributed/parallel algorithms, interconnection networks

References (43)

C. Neeb et al.
Designing efficient irregular networks for heterogeneous systems-on-chip
J. Syst. Archit.
(2008)
L.K. Gallos et al.
A review of fractality and self-similarity in complex networks
Phys. A: Stat. Mech. Appl.
(2007)
L. Benini et al.
Networks on chips: a new SoC paradigm
IEEE Comput.
(2002)
E. Bolotinand et al.
QNoC: Qos architecture and design process for network on chip
J. Syst. Archit.
(2004)
L. Bononi, N. Concer, Simulation and analysis of network on ship architectures: ring, spidergon, and 2D mesh, in:...
M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, A. Scandurra, Spidergon: a novel on-chip communication network,...
IBM, Cell Broadband Engine Programming Handbook,...
S. Kumar et al. A network on chip architecture and design methodology, in: Proceedings of the IEEE Computer Society...
R. Marculescu et al.
The chip is the network: toward a science of network-on-chip design
Found. Trends Electron. Des. Autom.
(2007)
S. Murali et al.
Synthesis of predictable networks-on-chip-based interconnect architectures for chip multiprocessors
IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
(2007)

P. Bogdan, R. Marculescu, Statistical physics approaches for network-on-chip traffic characterization, in: Proceedings...

U.Y. Ogras, J. Hu, R. Marculescu, Key research problems in NoC design: a holistic perspective, in: Proceedings of...

U.Y. Ogras, R. Marculescu, Application-specific network-on-chip architecture customization via long-range link...

U.Y. Ogras et al.

Its a small world after all: NoC performance optimization via long-range link insertion

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

(2006)

P.P. Pande et al.

Performance evaluation and design tradeoffs for network-on-chip interconnect architectures

IEEE Trans. Comput.

(2005)

A. Pinto, L.P. Carloni, A.L. Sangiovanni-Vincentelli, Efficient synthesis of networks on chip, in: Proceedings of the...

K. Srinivasan, K. Chatha, Isis: a genetic algorithm based technique for custom on-chip interconnection network...

K. Srinivasan et al.

Linear programming based techniques for synthesis of network on chip architectures

IEEE Trans Very Large Scale Integr (VLSI) Syst.

(2006)

S. Suboh et al.

An interconnection architecture for network-on-chip systems

Telecommun. Syst.

(2008)

S. Suboh, M. Bakhouya, S. Lopez-Buedo, T. El-Ghazawi, Simulation-based approach for evaluating on-chip interconnect...

J. Xu et al.

A design methodology for application-specific networks-on chip

ACM Trans. Embed. Comput. Syst.

(2006)

Cited by (5)

A novel hierarchical architecture for Wireless Network-on-Chip
2018, Journal of Parallel and Distributed Computing
Citation Excerpt :
The authors used the SA method to find the location of wireless links so that only some of the selected hubs are equipped with wireless interfaces [16]. In this approach, since there is no limitation for selecting the type of subnet topology, different topologies can be used for the subnet [11,18,25,34]. With regard to the features of small world graphs, Deb et al. proposed a new hierarchical architecture for WiNoC in which mm-wave based antennas were used [15].
In the architecture of Networks-on-Chip (NoCs), wired structure and multi-hop communications can lead to high power consumption and latency. Wireless NoC (WiNoC) architecture is a new alternative to solve these challenges. In this architecture, long-range wireless links are used instead of multi-hop wired paths. In this paper, a combination of several topologies are investigated to develop an efficient hierarchical structure for the architecture of WiNoC. The performances of considered hierarchical structures are compared under different traffic patterns. Finally, by using the Analytic Hierarchy Process (AHP) technique, a new hierarchical wireless NoC is proposed. In the proposed architecture, hierarchical structure and wireless links with high bandwidth are regarded as two significant factors for reducing the number of hops between distant nodes. Based on the results of simulations, the proposed hierarchical structure has better efficiency than other WiNoC architectures.
On the road to exascale: Advances in High Performance Computing and Simulations—An overview and editorial
2018, Future Generation Computer Systems
In recent decades, the complexity of scientific and engineering problems has increased considerably. New applications and domains that use high performance computing systems have been introduced. These trends are projected to continue for the foreseen future (Reed and Dongarra, 2015) [1]. In many areas of engineering and science, High-Performance Computing (HPC) and Simulations have become determinants of industrial competitiveness and advanced research. In fact, advances in HPC architectures, storages, networking, and software capabilities are leading to a new era in HPC and simulations, along with new challenges both in computing and systems modeling (Geist and Lucas, 2009) [2]. These developments are especially critical considering that HPC systems continue to scale up in terms of nodes, cores, and accelerators, as well as software, infrastructure and tools, which in turn are expediting the move on the path toward Exascale (Reed and Dongarra, 2015; Geist and Lucas, 2009; Dongarra and Beckman, 2011; Dosanjh et al., 2014; Engelmann, 2014) [[1], [2], [3], [4], [5]].
Scalability and availability represent two of the main requirements that need to be considered before conceiving of these large-scale systems (ASCAC Subcommittee on Exascale Computing, 2010). The scalability feature allows the system to proportionately grow when service demand increases, whereas availability means the system continues to provide their services despite hardware and software failures (Theodoropoulos et al., 2014; Tang et al., 2014) [[7], [8]]. The goal in large-scale HPC is to accommodate both availability and scalability while staying under strict constraints on performance (e.g., processing time) and cost metrics (e.g., power consumption).
This special issue is envisioned to provide examples of research work on topics related to recent advances in High Performance Computing and Simulations. It briefly addresses and explores challenges toward Exascale computing, current state-of-the-art in HPC and simulation, and the path forward in the domains of large-scale HPC systems.
Technology trend towards development of future generation of computer
2019, International Journal of Scientific and Technology Research
A hierarchical architecture based on traveling salesman problem for hybrid wireless network-on-chip
2019, Wireless Networks
Wireless sensor networks: Basics and fundamentals
2016, Cyber-Physical System Design with Sensor Networking Technologies

Mohamed Bakhouya is an associate professor-HDR at International University of Rabat. He received his HDR (2013) from University of High Alsace and the Ph.D. (2005) degree in computer science and computer engineering from the UTBM, France. His research interests include various aspects on the design, validation, implementation, performance evaluation and analysis of distributed systems, architectures, protocols and services.

Jaafar Gaber received his Ph.D. (1998) degree in computer science from University of Science and Technology of Lille, France. He is currently an associate professor of computational sciences and computer engineering at the UTBM, France. Prior to joining UTBM, he was a research scientist at the institute of computational sciences and informatics (CSI) in George Mason University, USA. His research interests include high-performance computing, distributed data mining, biocomputing, distributed algorithms and mobile computing.

Maxime Wack received his Ph.D. (1981) degree in computer science from the University of Technology in Compiegne (UTC), France. Currently, he is associate professor-HDR at the optimization and networks laboratory (Opera) at the UTBM. His research interests include information systems, ubiquitous and pervasive computing, and distributed systems and communications.

View full text

A design space exploration methodology for customizing on-chip communication architectures: Towards fractal NoCs

Abstract

Introduction

Section snippets

Related work

OCI criteria

OCI customization approach

Evaluation study

Conclusions and future work

J. Syst. Archit.

Phys. A: Stat. Mech. Appl.

Networks on chips: a new SoC paradigm

IEEE Comput.

QNoC: Qos architecture and design process for network on chip

J. Syst. Archit.

The chip is the network: toward a science of network-on-chip design

Found. Trends Electron. Des. Autom.

Synthesis of predictable networks-on-chip-based interconnect architectures for chip multiprocessors

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Its a small world after all: NoC performance optimization via long-range link insertion

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Performance evaluation and design tradeoffs for network-on-chip interconnect architectures

IEEE Trans. Comput.

Linear programming based techniques for synthesis of network on chip architectures

IEEE Trans Very Large Scale Integr (VLSI) Syst.

An interconnection architecture for network-on-chip systems

Telecommun. Syst.

A design methodology for application-specific networks-on chip

ACM Trans. Embed. Comput. Syst.