Congestion-aware core mapping for Network-on-Chip based systems using betweenness centrality

doi:10.1016/j.future.2016.12.031

Future Generation Computer Systems

Volume 82, May 2018, Pages 459-471

https://doi.org/10.1016/j.future.2016.12.031 Get rights and content

Highlights

•
Present a novel application of edge betweenness centrality metric for NoCs.
•
Identify highly loaded NoC links by capturing dynamic characteristics of core traffic.
•
Apply edge betweenness centrality to capture operational characteristics of system.
•
Propose a congestion-aware core mapping heuristic to alleviate network contention.

Abstract

Network congestion poses significant impact on application performance and network throughput in Network-on-Chip (NoC) based systems. Efficient core mapping can significantly reduce the network contention and end-to-end latency leading to improved application performance in NoC based multicore systems. In this work, we propose a Congestion-Aware (CA) core mapping heuristic based on betweenness centrality metric. The proposed CA algorithm optimizes core mapping using betweenness centrality of links to alleviate congestion from highly loaded NoC links. We use modified betweenness centrality metric to identify highly loaded NoC links that are more prone to congestion. In contrast to traditional betweenness centrality metric, which is generally used to measure the structural/static characteristics of the system, the adapted betweenness centrality metric utilizes the volume of communication traversing through the edges (NoC links) to capture the operational and dynamic characteristics of the system. The experimental results demonstrate that our proposed algorithm achieved significantly lower average channel load and end-to-end latency compared to the baseline First Fit (FF) and Nearest Neighbor (NN) core mapping algorithms. Particularly, CA algorithm achieved up to 46% and 12% lower channel load and end-to-end latency compared to FF algorithm, respectively. Moreover, proposed algorithm exhibits an average gain of 32% in terms of reduced network energy consumption compared to the baseline configuration.

Introduction

Emerging big data applications are characterized by massive datasets, high parallelism degree, and strict power and performance constraints that are significantly different from traditional high performance computing (HPC) and desktop application workloads. Business potential of these big data applications coupled with cloud services serves as driving force for the development of innovative computer architectures. For instance, to address the challenges of such emerging applications, Single-chip Cloud Computer (SCC) has been developed by Intel, which is an experimental system based on multicore architecture [1]. Similarly, Tilera developed chips with 64 and 100 cores connected through an on-chip network [2], [3]. Due to the advancements in integration technologies and silicon chip design, future Multi-Processor System-on-chip (MPSoC) systems are anticipated to comprise hundreds or even thousands of processing cores also termed as Processing Elements (PEs) [4], [5], [6]. Tremendous compute power is promised by multicore architectures, such as SCC, which can be leveraged by efficiently splitting the workload across multiple processing cores. Ideally, the division of workload should lead to linear speedup, which is equivalent in proportion to the number of cores on chip. However, in practice as the number of processing cores increase, the raw compute performance can be overshadowed by certain overheads, such as network communication among the interdependent tasks executing on different cores. Therefore, efficient communication mechanism and interconnect design play a vital role in achieving high performance and energy efficiency in future multicore architectures.

For such multicore systems with large number of PEs, Network-on-Chip (NoC) has emerged as a scalable alternative to traditional bus based architecture [7], [8]. In traditional MPSoCs, shared bus or global wires are used for communication among the PEs. Whereas, NoC based MPSoCs use a network of shared links connected through routers in different topologies, such as mesh, torus, or tree. The advantage of NoC based architecture is that multiple PEs can communicate simultaneously over different paths. It is highly desirable that parallel communication between different pairs of PEs should not interfere with each other, i.e., the paths should be contention free. However, in practice it is very difficult to achieve totally contention free paths. Consequently, congestion within NoC can significantly affect the overall application performance, especially network latency and throughput of the system [9], [10]. Reducing network contention within NoC based multicore processor systems is crucial as network contention can delay the startup time and/or completion time of tasks that wait for data arrival before tasks can start execution [8]. Network contention is generally caused when a communication channel is occupied by another packet and the rest of the packets have to wait until the required communication channel becomes available. Most of the previous studies considered mapping of a single task on each processing core and have conducted analysis over a single mesh NoC [4], [10], [11], [12], [13], [14], [15]. Moreover, there are very few works that consider both task and core mapping to reduce the network contention in NoC based multicore systems. In this work, we consider mapping of multiple tasks on each core. We demonstrate effectiveness and performance of our proposed scheme using extensive simulations with varying mesh sizes ranging from $5 \times 5$ to $10 \times 10$ mesh NoCs.

In this work, we conduct congestion analysis of our proposed communication and energy aware task packing and core mapping algorithms [16]. Moreover, we propose a congestion-aware core mapping heuristic with the objective to reduce the network contention and average end-to-end latency. The proposed congestion-aware core mapping algorithm works initially by identifying the highly loaded NoC links that can potentially cause congestion within the network. Afterwards, core mapping is optimized to alleviate network load from the highly loaded NoC links leading to reduced network contention and delay. To identify the highly loaded NoC links that are more prone to congestion, we apply a modified betweenness centrality metric. In contrast to traditional betweenness centrality metric, which is generally used to measure the structural characteristics of the system, the adapted edge betweenness centrality metric utilizes the volume of communication traversing through the edges (NoC links) to capture and exploit the dynamic behavioral characteristics of the system. For instance, the authors in [17], [18] used betweenness centrality to model the structural behavior of Data Center Networks (DCNs). The authors proved that legacy structural metrics, such as betweenness centrality are not useful to capture the operational and dynamic behavior of DCNs. Therefore, we utilize a modified betweenness centrality metric to capture the dynamic and operational behavior of NoC based multicore systems.

Major contributions of this article are summarized as follows.

•
We propose a novel application of edge betweenness centrality metric to capture the operational and dynamic characteristics of NoC traffic and identify highly loaded NoC links that can potentially cause congestion.
•
Propose a congestion-aware core mapping heuristic to optimize core mapping in such a manner to alleviate network contention and resulting delay within NoC.
•
Extensive simulations are conducted to analyze the effectiveness of proposed scheme with varying NoC sizes ranging from $5 \times 5$ to $10 \times 10$ mesh NoCs. Performance parameters used to evaluate the proposed congestion-aware core mapping heuristic include: (a) channel load, (b) end-to-end latency, (c) NoC energy consumption, and (d) workflow completion time.

The rest of the article is organized as follows. Section 2 provides an overview of prior works closely related to proposed work. System models and problem formulation are discussed in Section 3. Proposed congestion-aware algorithm is presented in Section 4. Experimental results are presented and discussed in Section 5. Concluding remarks and future directions are provided in Section 6.

Section snippets

Related work

Congestion-aware task and core mapping for NoC based multicore processor systems is a major challenge and has gained significant attention in the recent past. Therefore, in this regard various solutions have been proposed in scientific literature [4], [10], [11], [12], [13], [14], [15], [19], [20], [21]. In this section we will briefly discuss some of the proposals that are closely related to this work.

To reduce network contention, Carvalho et al. presented various congestion-aware task mapping

Application

An application is generally represented in the form of a Task Graph TG ( $T, C$ ) as shown in Fig. 1. Each node in TG represents a task $t_{i} \in T$ , while edges between nodes represent the volume of data needed to be exchanged between tasks. The set of all the edges is denoted by $C$ , while each edge $c_{j, k}$ contained in $C$ signifies the communication dependencies between $t_{j}$ and $t_{k}$ . The communication volume between $t_{j}$ and $t_{k}$ is captured by the weight $w_{j, k}$ of $c_{j, k}$ . For instance, as shown in Fig. 1, the

Proposed heuristic

To address the aforementioned problem, we propose a Congestion-Aware (CA) core mapping algorithm presented in Table 2. The CA algorithm starts by calculating the edge betweenness centrality for all edges using Eq. (7), considering the communication volume. Next, all of the edges are sorted according to betweenness centrality of edges in descending order. Afterwards, each edge is evaluated in sorted order. For each edge, the algorithm calculates the current network load on the edge by taking sum

Experimental setup

Performance evaluation of proposed algorithms has been conducted through extensive simulations carried out using Heterogeneous Network-on-Chip Simulator (HNOCS) [30]. Table 3 presents the parameter values used for various NoC components in HNOCS. A large number of simulation scenarios have been generated by varying different application parameters, such as number of tasks, computation to communication ratio (CCR), and number of processing cores. Simulation experiments have been conducted with

Conclusions and future work

We proposed a congestion-aware core mapping algorithm aimed at reducing the network contention and end-to-end latency. The proposed algorithm utilized a modified betweenness centrality metric to identify the NoC links that are more prone to network congestion. To reduce network congestion, the CA algorithm optimizes the core mapping in such a way to alleviate network load from NoC links that are prone to network contention. Experimental results after extensive simulations showed the

Acknowledgments

Dr. Bilal’s research was partially funded by National Research Foundation of Qatar (NPRP, Grant Nos. 8-519-1-108).

Tahir Maqsood received his B.S. in Computer Science from PIMSAT, Pakistan, in 2001, Master in Computer Science from KASBIT, Pakistan, in 2004, and M.S. in Computer Networks from Northumbria University, UK, in 2007. He is currently a Ph.D. candidate in the Department of Computer Science, COMSATS Institute of Information Technology, Abbottabad, Pakistan. His research interests include task and core mapping, energy efficient systems, and network performance evaluation.

References (32)

K. Pang
Task mapping and mesh topology exploration for an FPGA-based network on chip
Microprocess. Microsyst.
(2015)
A.K. Singh
Run-time mapping of multiple communicating tasks on MPSoC platforms
Procedia Comput. Sci.
(2010)
S.-H. Lee et al.
Communication-aware task assignment algorithm for MPSoC using shared memory
J. Syst. Archit.
(2010)
C. Wang
Area and power-efficient innovative congestion-aware Network-on-Chip architecture
J. Syst. Archit.
(2011)
M. Daneshtalab
A systematic reordering mechanism for on-chip networks using efficient congestion-aware method
J. Syst. Archit.
(2013)
C. Wang et al.
A load-balanced congestion-aware wireless network-on-chip design for multi-core platforms
Microprocess. Microsyst.
(2012)
T. Maqsood
Dynamic task mapping for Network-on-Chip based systems
J. Syst. Archit.
(2015)
A.-L. Georgiadis et al.
Deploying and monitoring hadoop MapReduce analytics on single-chip cloud computer
M. Shafique et al.
Agent-based distributed power management for kilo-core processors
J. Xu
Towards high-speed real-time HTTP traffic analysis on the Tilera many-core platform

C. Wang

A dynamic contention-aware application allocation algorithm for many-core processor

X. Yu

Staring into the abyss: An evaluation of concurrency control with one thousand cores

Proc. VLDB Endow.

(2014)

D.R. Johnson

Rigel: A 1024-core single-chip accelerator architecture

IEEE Micro

(2011)

J.-J. Han

Contention-aware energy management scheme for NoC-based multicore real-time systems

IEEE Trans. Parallel Distrib. Syst.

(2015)

G. Kim

An augmented reality processor with a congestion-aware network-on-chip scheduler

IEEE Micro

(2014)

H.-L. Chao

Congestion-aware scheduling for NoC-based reconfigurable systems

Cited by (0)

Kashif Bilal is an Assistant Professor at COMSATS Institute of Information Technology, Abbottabad, Pakistan. He received his Ph.D. in Electrical and Computer Engineering from North Dakota State University, Fargo, USA in 2014. He also received College of Engineering (CoE) Graduate Student Researcher of the year 2014 award at NDSU. His research interests include energy efficient high speed networks, Green Computing, and robustness in data centers. Currently, he is focusing on exploration of network traffic patterns in real data centers and development of data center network workload generator.

Sajjad A. Madani works at COMSATS Institute of Information technology as Associate Professor. He did M.S. in Computer Sciences from Lahore University of Management Sciences and Ph.D. from Vienna University of Technology. His areas of interest include low power wireless sensor network and green computing. He has published more than 40 papers in peer reviewed international conferences and journals.

View full text

Congestion-aware core mapping for Network-on-Chip based systems using betweenness centrality

Highlights

Abstract

Introduction

Section snippets

Related work

Application

Proposed heuristic

Experimental setup

Conclusions and future work

Acknowledgments

Microprocess. Microsyst.

Procedia Comput. Sci.

J. Syst. Archit.

J. Syst. Archit.

J. Syst. Archit.

Microprocess. Microsyst.

J. Syst. Archit.

Deploying and monitoring hadoop MapReduce analytics on single-chip cloud computer

Agent-based distributed power management for kilo-core processors

Towards high-speed real-time HTTP traffic analysis on the Tilera many-core platform

A dynamic contention-aware application allocation algorithm for many-core processor

Staring into the abyss: An evaluation of concurrency control with one thousand cores

Proc. VLDB Endow.

Rigel: A 1024-core single-chip accelerator architecture

IEEE Micro

Contention-aware energy management scheme for NoC-based multicore real-time systems

IEEE Trans. Parallel Distrib. Syst.

An augmented reality processor with a congestion-aware network-on-chip scheduler

IEEE Micro

Congestion-aware scheduling for NoC-based reconfigurable systems