CFPA: Congestion aware, fault tolerant and process variation aware adaptive routing algorithm for asynchronous Networks-on-Chip

doi:10.1016/j.jpdc.2019.03.001

Journal of Parallel and Distributed Computing

Volume 128, June 2019, Pages 151-166

https://doi.org/10.1016/j.jpdc.2019.03.001 Get rights and content

Highlights

•
A new routing algorithm (called CFPA) for networks-on-chip is proposed.
•
CFPA is congestion-aware, process-variation-delay-aware and fault-tolerant.
•
CFPA provides significant improvement in throughput, delay and fault-tolerance.

Abstract

Delays caused by congestion, faults and process variation (PV) degrade networks-on-chip (NoC) performance. A congestion aware, fault tolerant and process variation aware adaptive routing algorithm (CFPA) is introduced for congested and faulty asynchronous NoCs. The proposed routing algorithm maintains two routing tables to determine the packet path: one for routing directions based on propagation delay (including PV delay), and the other to keep track of the queuing delays at each router port. The queuing delay is used as an indication for congestion. The proposed routing tables store multiple paths to every destination via all polar directions, which makes CFPA a fault tolerant algorithm in case of path failures. The proposed algorithm is verified against other popular routing algorithms for NoCs with different topologies and network dimensions. On average, CFPA enhances the NoC throughput by 60% compared to the recently proposed routing algorithms. With CFPA, the impact of faults on NoC throughput is alleviated by 48%. In addition, the average delay of messages routed using CFPA is shorter than that of other algorithms by (26 $\sim$ 75)% under process variation conditions. Furthermore, the proposed algorithm minimizes the impact of PV on NoC throughput to less than 5% of the nominal throughput for mesh topology.

Introduction

Recent evolution in digital electronics made integrating System-on-Chip (SoC) a reality. SoC allows the designers to integrate a large number of processing elements (PEs) into a single chip [38]. In SoC, PEs are interconnected via bus interconnections. As SoC becomes more complex, designers face scalability and power dissipation issues. Network-on-Chip (NoC) is proposed as a solution for the interconnection limitation. In the NoC paradigm, bus-control logic of the bus interconnects is replaced with routers. Routers make NoCs smarter and more scalable.

Similar to computer networks, NoC employs routing algorithms to forward data flits (packet segment name in NoC) among routers to their respective destinations. Using adaptive routing improves the NoC performance and provides tolerance to link and node failures. Despite the overhead and complexity in implementation, adaptive algorithms are preferable in NoCs due to their superior throughput and delay performance. A key role of the routing algorithm is to avoid congestion occurrence and to balance the traffic load among the links in the overall network [23]. In addition, fault tolerance is needed to avoid deadlocks and data loss.

Manufacturing variations (i.e. threshold voltage and gate length variations) cause deviation of process parameters from their nominal values, which is known as process variation (PV). With technology scaling down, PV becomes more severe. However, asynchronous design has better tolerance to PV with technology scaling. PV effects could be mitigated using advanced routing techniques. Communication channels connecting network nodes have interconnect delays that contribute to packet propagation delay, and hence negatively affect whole NoC performance [18]. Many techniques are proposed to mitigate the impact of PV [22], [39]. Most of the proposed techniques ignore the routing mechanism which is strongly affected by the PV delays and resulted congestion.

The majority of computer network routing protocols are categorized into two main categories; link state (LS) routing and distance vector (DV) [30]. To the best of our knowledge, neither DV nor LS algorithms are proposed for asynchronous NoCs. Customized versions of LS and DV protocols are proposed in[1], [35] to mitigate the communication complexity and overhead of synchronous NoCs. Customized LS exchanges only updated link status with no frequent broadcasting data to be sent over the network. An asynchronous router is designed for time division multiplexed NoC as proposed in [20]. In [29], an asynchronous NoC router employing an adaptive algorithm for routing is proposed to support communication with the synchronous processing element (PE). Source routing dependent asynchronous NoC design is introduced in [33]. The work proposed in [33] presented timing organization analysis for Globally Asynchronous Locally Synchronous (GALS) and asynchronous architectures [17]. In [41], a reconfigurable routing is proposed for faults handling in 2-D NoC with noticeable area overhead. With the advance of today’s fabrication technology, considering router ports queuing delay and process variation impact on asynchronous NoC performance become essential design needs. Not all of the mentioned algorithms take care of these two essential performance limiting factors.

Fig. 1 depicts a NoC consisting of nine cores. In particular, nine PEs are interconnected via routers using a mesh topology, where every PE is associated with its router. Based on the asynchronous router architecture proposed in [6], the proposed routing algorithm is to be deployed into the routing unit of the NoC router. Handshaking protocol signals are used to synchronize messages transmission among routers [28]. Flits traveling from their sources to their respective destinations experience a propagation delay $D_{P}$ and a queuing delay $D_{Q}$ . The role of the routing algorithm is to select the best path for data flits to minimize the impact of congestion and PV on $D_{P}$ and $D_{Q}$ , moreover tolerate runtime faults.

The main focus of this paper is to introduce a novel congestion aware, fault tolerant, and process variation aware routing algorithm (CFPA) for asynchronous NoC. In addition, a full analysis of the different delay components which the algorithm deals with is targeted. A detailed routing message model is also provided. The routing overhead analysis is presented. CFPA minimizes the process variation impact on the throughput to ignorable values (typically less than 5% of the nominal values). Moreover, the message delay is reduced to noticeable levels compared to other tested routing algorithms. The proposed algorithm keeps the throughput degradation less than 20% for a faulty NoC, whereas other algorithms lose 60% of their throughput due to link and node failures.

The rest of this paper is organized as follows. Related work is discussed in Section 2. In Section 3, the throughput and delay models are introduced. The proposed CFPA algorithm is presented in Section 4. The implementation of CFPA is provided in Section 5. A lite-weight version of CFPA is introduced in Section 6. The fault tolerance feature is illustrated in Section 7. Simulation results are provided in Section 8. Finally, some conclusions are discussed in Section 9.

Section snippets

Related work

NoC performance is strongly affected by the used topology [4] and the employed routing algorithm [40]. XY, Odd–Even (OE)[40], [42] and Dynamic Adaptive and Deterministic (DyAD) [15] are widely used algorithms in NoCs. In XY, flits are routed in the x direction to the destination column, then vertically in the y direction to the destination row. In OE routing, eight 90-degree turns (i.e. North-East…) are available for routing. Some turns are permitted, and some are not permitted for NoC columns,

Throughput and delay models

The NoC performance is characterized by its throughput and delay. The introduced throughput and delay models are based on the models proposed in [28]. In Section 3.1 the throughput variation model is presented. The delay model is introduced in Section 3.2.

CFPA algorithm description

The proposed algorithm is congestion-aware, fault tolerant, process-variation-aware and adaptive. Therefore, it is referred to as CFPA. Note also that a modified reduced complexity (lite) version of the algorithm (achieving better performance and scalability) is presented in Section 6.

The number of ports which are connected to a router vary according to the used network topology and the location of the router within the topology. The assumed router architecture used has 5 ports from North (N),

CFPA implementation

CFPA depends on exchanging a set of routing messages to maintain its tables. Propagation delay message (PD_MSG) is used to exchange propagation delay information among routers. The queuing delay message (QD_MSG) carries queuing delay information throughout the network. PD_MSG and QD_MSG are discussed in more detail later in Sections 5.2 Propagation delay handling, 5.3 Queuing delay handling, respectively. In addition, CFPA uses still-alive message (SL_MSG), which carries no information except

Lite CFPA algorithm (CFPA $_{L}$ )

The queuing delay calculation from source to destination is the primary contribution for CFPA algorithm. However, the periodic all-to-all broadcasting of QD_MSGs may add considerable overhead. Moreover, extra per-node memory needs (to store QD_T entries) and computational complexity issues arise with scaling the NoC. A lite version of CFPA is proposed in this section to reduce the overhead and achieve more scalable algorithm for larger NoCs, with sub-optimum routing path selected for the sake

Fault tolerance feature

As discussed in Section 4, CFPA algorithm provides up to four alternative routes to every destination. Each router sends SL_MSG in each SL_ $_{U - I N T}$ duration to directly connected neighbors to inform them it is still alive. Furthermore, each router times out the connection to a direct neighbor when it receives no SL_MSG from that neighbor. Hence, when the timer reaches TTR, it is implied that the link to this neighbor is broken. Reporting failure of a route to a node is achieved via sending PD_MSG

Simulation results

A PV-enabled Heterogeneous Network-on-Chip Simulator (HNOCS) [2], [6] is used to evaluate the proposed routing algorithm. HNOCS is an OMNeT++ [36] based simulator. ADS tools and Monte Carlo (MC) simulator are used at the circuit level (32 nm manufacturing technology) to estimate the propagation delay components (Eqs. (6), (7)). the values of $D_{i - n}$ , $D_{o - n}$ , $D_{i - p v}$ , $D_{o - p v}$ and $D_{i n t - p v}$ [6] are randomly picked from MC iterations output values and assigned to different ports and channels of the tested

Conclusions

Congestion Aware, Fault Tolerant, and Process Variation Aware Adaptive Algorithm (CFPA) is proposed for NoCs. Two versions of CFPA are proposed. A lite version (referred to as ${CFPA}_{L}$ ) has less overhead and computational complexity.

CFPA relies on maintaining a propagation delay (including PV delay) routing table as well as queuing delay table with periodic data exchange to select the shortest route. Choosing the best path depends on both queuing and propagation delays. Considering queuing delay

Acknowledgment

The second and third authors were supported in part by the Distributed and Networked Systems Research Group Operating, United Arab Emirates Grant No. 150410, University of Sharjah.

References (42)

GawishE.K. et al.
Variability-tolerant routing algorithms for networks-on-chip
Microprocess. Microsyst. B
(2014)
GawishE.K. et al.
Process variability-induced NoC link failure: A probabilistic model
Microelectron. J.
(2015)
LiSheng. et al.
The McPAT framework for multicore and manycore architectures: Simultaneously modeling power, area, and timing
ACM Trans. Architecture Code Optim. (TACO)
(2013)
MuhammadS.T. et al.
Architecture level analysis for process variation in synchronous and asynchronous networks-on-chip
J. Parallel Distrib. Comput.
(2017)
M. Ali, W. Michael, S. Hessler, A fault tolerant mechanism for handling permanent and transient failures in network on...
Y. Ben-Itzhak, E. Zahavi, I. Cidon, A. Kolodny, HNOCS: Modular open-source simulator for heterogeneous NoCs, in: The...
M. Boegli, T. De Laet, J. De Schutter, J. Swevers, A split-horizon scheme for on-line friction parameter estimation,...
ElmiligiH. et al.
Improving networks-on-chip performability: A topology-based approach
Intl. J. Circuit Theory Appl.
(2011)
Ezz-EldinR. et al.
Analysis and Design of Networks-on-Chip Under High Process Variation
(2015)
Ezz-EldinR. et al.
Process variation delay and congestion aware routing algorithm for asynchronous NoC design
IEEE Trans. Very Large Scale Integr. (VLSI) Syst.
(2015)

Ezz-EldinR. et al.

Process variation delay and congestion aware routing algorithm for asynchronous NoC design

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

(2015)

FengC. et al.

Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

(2013)

GiuseppeA. et al.

Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip

IEEE Trans. Comput.

(2008)

GramaA. et al.

Basic Communication Operation, in Introduction To Parallel Computing

(2003)

P. Gratz, B. Grot, W. Keckler, Regional congestion awareness for load balance in networks-on-chip, in: The Proceedings...

http://www.itrs.net/Links/2013ITRS/Home2013.htm (last visited: January...

J. Hu, R. Marculescu, DyAD: Smart routing for networks-on-chip, in: The Proceedings of the Annual Design Automation...

D. Jindun, J. Xin, L. Renjie, W. Takahiro, An efficient deadlock-free adaptive routing algorithm for 3D...

A. Johannes, N. Kucza, M. Vohrmann, T. Jungeblut, M. Porrmann, U. Rückert, Comparing synchronous, mesochronous and...

KarR. et al.

An explicit approach for delay evaluation for on-chip RC interconnects using beta distribution function by moment matching technique

Proc. Intl. Conf. Recent Trends Inf. Telecommun. Comput.

(2010)

A. Kothari, D. Patel, Methodology to solve the count-to-infinity problem by accepting and forwarding correct and...

Cited by (13)

On chip network with increased performance for efficient wireless communication
2023, Measurement: Sensors
Core systems with network transactions deployed semiconductor materials to develop wireless networks-on-chip to minimize latency with increased performance. For transmitting data from the source point towards the target point, an appropriate reconfigurable routing method has to be deployed with respect to nodes. For overhead on-chip communication that involves the linking of many cores in a single chip, congestion may occur which has to be eliminated. A marching memory arbitrator is deployed in the path that is prone to congestion which computes the port as a buffer. The static degradation of energy power utilization in the router is solved by using a Marching memory buffer. The secure communication of data can be deployed with hash, identity, and address verification blocks. The traffic is then relaxed by routing arbitrator and then data transmission is done through frequency division multiplexing in the communication channel with reconfigurable routing. The analysis of simulation results is found to have a better throughput, less latency, and reduced power consumption.
Design and implementation of congestion aware router for network-on-chip
2023, Integration
Citation Excerpt :
Masoud et al. [13] have proposed a re configurable NoC architecture with high performance and low power consumption. Congestion awareness along with Fault Tolerance and process variation awareness on an adaptive routing for asynchronous NoC is shown in [3]. Congestion prediction algorithms could be used for a more balanced traffic distribution to improve the throughput and speed [14].
Network-on-Chip (NoC) is the state of the art on-chip interconnection network for packet based communication. NoCs can offer low packet latency, high bandwidth, high throughput with minimum area, better energy efficiency and fault tolerance. Routers are the basic building blocks of the NoCs. In this paper, we present the design of a Congestion Aware Router for NoC which is then implemented using Vivado HLS. The router is then used to develop a scalable NoC based on mesh topology. Using the NoC as a test bed we carry out simulations and estimate performance metrics like latency, waiting time and total packets handled for various configurations of NoC. Provisions to alter parameters like buffer depth, packet size, packet injection interval and traffic are also added. Further, we propose a simple mechanism for detecting congestion at the router. The congestion metric is then used to adapt the XY dimension order routing into a Congestion Aware minimal adaptive X/Y routing strategy with very low hardware overhead. The proposed routing method is compared against conventional XY DOR, GCA routing and RCS based routing algorithms for different parameter variations. The results show that the proposed routing method can reduce packet latency for different traffic patterns at medium packet injection rates.
Multi-objective biogeography-based optimization and reinforcement learning hybridization for network-on chip reliability improvement
2022, Journal of Parallel and Distributed Computing
Citation Excerpt :
The aim is to improve the main design goals of the NoC, such as latency, area, bandwidth, power consumption, communication efficiency, etc. Reliability is a key factor in the design of NoCs [80]. These are subject to different types of faults, which are classified in [32]] into three types: intermittent, transient and permanent.
Reliability is increasingly a major concern in network-on-a-chip (NoC) design, alongside increased performance demands from new applications and the need for continued miniaturization of silicon technology. In this article, we look at the task migration mechanism, used to recover from permanent processing element (PE) failures in NoCs, by remapping tasks performed on faulty cores to spare ones.
An innovative reliability-aware task mapping technique is presented, based on a hybridization between Multi-Objective Optimization (MOO) and Reinforcement Learning (RL). It takes place in two steps. In the first, a set of optimal remapping solutions for different failure scenarios is generated at design-time, using a Biogeography-Based Multi-Objective Optimization algorithm, while considering communication energy and migration costs. In the second step, an artificial neural network agent is trained to select the best remapping solution, from those generated at design-time, to recover from execution failures at run-time.
Experiments were carried out to evaluate our technique for different sizes of networks and on different benchmarks. The results obtained show that the technique based on the hybridization MOO_RL brings a great improvement in the reliability of the NoC and achieves a good compromise between reliability and performance. It also guarantees a reduction of the overhead caused by the storage space of the remapping solutions, compared to the existing solutions.
Routing Techniques in Network-On-Chip Based Multiprocessor-System-on-Chip for IOT: A Systematic Review
2024, Iraqi Journal for Computer Science and Mathematics
A Path Utilization-Based Congestion-Aware Deadlock-Free Routing for Network-on-Chip
2022, ACM International Conference Proceeding Series
Design and development of low power clock and data recovery circuit for asynchronous network on chips
2022, Journal of Integrated Circuits and Systems

View all citing articles on Scopus

Sayed Taha Muhammad received the B.S. (Hons.) degree in communication and electronics engineering from Cairo University, Egypt, in 2004, the Software Engineering Diploma degree from the Information Technology Institute (ITI), Cairo, Egypt, in 2008, and the M.Sc. degree in electronics engineering from Fayoum University, in 2014. Sayed received Ph.D. degree in electrical and computers engineering from Minia University, Minia, Egypt, in 2018. He joined the department of Electrical Engineering, Beni-Suef University, Beni-suef, Egypt, in 2009, as a demonstrator, and he became an Assistant Lecturer, in 2014. Sayed joined Computers and systems engineering department at Fayoum university, Fayoum, Egypt in 2017. He currently works on his post Ph.D researches. Sayed recently published in IEEE IDT conference, JPDC, TVLSI, IEEE ICEAC conference and one Springer book chapter. His current research interests include networks-on-a-chip (NoC) power dissipation, process variation in NoC, routing algorithms, Software engineering, and computer architecture.

Mohamed Saad (Senior Member, IEEE) received Ph.D. degree in electrical and computer engineering from McMaster University, Hamilton, Canada, in 2004. He is currently an Associate Professor at the Department of Electrical and Computer Engineering, University of Sharjah, UAE. His research interests include networking, communications and optimization, with current activity focused on the optimal design of wireless and wired communication networks, and optimal network resource management. He has also held research positions with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada, and the Advanced Optimization Laboratory at the Department of Computing and Software, McMaster University, Hamilton, Canada. Dr. Saad is an editor for the International Journal of Distributed Sensor Networks. He was the recipient of the best paper award in the IEEE Symposium on Computers and Communications, Riccione, Italy, June 2010. He was the recipient of the University of Sharjah “Annual Incentive Award for Distinguished Faculty Members”, for excellence in research, April 2010 (university-wide). He received also two best teaching awards by the IEEE Women in Engineering Society, University of Sharjah (in 2007 and 2009). He was also the recipient of a 2005–2006 Natural Sciences and Engineering Research Council of Canada (NSERC) post-doctoral fellowship. He is a senior member of the IEEE.

Ali A. El-Moursy received the B.S. (Hons.) degree in electronics and communications engineering and the master’s degree in computer engineering from Cairo University, Cairo, Egypt, in 1996 and 2000, respectively, and the master’s degree in electrical engineering and the Ph.D. degree in high performance computer architecture from University of Rochester, Rochester, NY, USA, in 2001 and 2005, respectively. He was with the Software Solution Group, Intel Corporation, Santa Clara, CA, USA, till 2007. He joined the Electronics Research Institute, Giza, Egypt, in 2007. His current research interests include high-performance computer architecture, multicore multithreaded micro-architecture, power-aware micro-architecture, simulation and modeling of architecture performance and power, workload profiling and characterization, cell programming, high performance computing, parallel computing, and cloud computing. He has also participated with the IBM Cairo Technology Development Center, Giza, Egypt, as a Visitor Research Scientist, in many Cell BE and Blue Gene projects and research activities from 2007 to 2010. He was with GRD Egypt DVT Modelsim Team, Mentor Graphics Egypt, Cairo, Egypt, as a Senior Development Engineer, from 2010 to 2011. He joined the Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates, in 2010, as an Assistant Professor.

Magdy A. El-Moursy was born in Cairo, Egypt in 1974. He received the B.S. degree in electronics and communications engineering (with honors) and the Master’s degree in computer networks from Cairo University, Cairo, Egypt, in 1996 and 2000, respectively, and the Master’s and the Ph.D. degrees in electrical engineering in the area of high-performance VLSI/IC design from University of Rochester, Rochester, NY, USA, in 2002 and 2004, respectively. In summer of 2003, he was with STMicroelectronics, Advanced System Technology, San-Diego, CA, USA. Between September 2004 and September 2006, he was a Senior Design Engineer at Portland Technology Development, Intel Corporation, Hillsboro, OR, USA. During September 2006 and February 2008, he was assistant professor in the Information Engineering and Technology Department of the German University in Cairo (GUC), Cairo, Egypt. Between February 2008 and October 2010, he was Technical Lead in the Mentor Hardware Emulation Division, Mentor Graphics Corporation, Cairo, Egypt. Dr. El-Moursy is currently Staff Engineer in Design Creation and Synthesis Division, Mentor Graphics Corporation, and Associate Professor in the Microelectronics Department, Electronics Research Institute, Cairo, Egypt. He is Associate Editor in the Editorial Board of Elsevier Microelectronics Journal, International Journal of Circuits and Architecture Design and Journal of Circuits, Systems, and Computers and Technical Program Committee of many IEEE Conferences such as ISCAS, ICM, ICAINA, PacRim CCCSP, ISESD, SIECPC, and IDT. His research interest is in Networks-on-Chip/System-on-Chip, interconnect design and related circuit level issues in high performance VLSI circuits, clock distribution network design, digital ASIC circuit design, VLSI/SoC/NoC design and validation/verification, circuit verification and testing, Embedded Systems and low power design. He is the author of 70 papers, five book chapters, and three books in the fields of high speed and low power CMOS design techniques and NoC/SoC.

Hesham F. A. Hamed was born in Giza, Egypt, in 1966. He received the B.Sc. degree in electrical engineering, and M.Sc. degree in electronics and communications engineering from Minia University, EL-Minia, Egypt, in 1989, and 1993, respectively. He received Ph.D. degree in electronics and communications engineering from Texas A&M University, College Station, Texas, USA, and Minia University (as Channel system between the two universities) in 1997. He currently is the dean of faculty of Engineering, Minia University, EL-Minia, from 1989 to 1993 he worked as a Teacher Assistant in the Electrical Engineering Department, Minia University. From 1993 to 1995 he was a visiting scholar at Cairo University, Cairo, Egypt. From 1995 to 1997 he was a visiting scholar at Texas A&M University, College Station, Texas (with the group of VLSI). From 1997 to 2003 he was an Assistant Professor in the Electrical Engineering Department, Minia University. From 2003 to 2005 he was Associate Professor in Alkharj College of technology, Alkharj, KSA. From 2005 to 2007 he was a Visiting Professor at Ohio University; Athens, Ohio, USA. He has published more than 70 papers, one book and three book chapters. His research interests include analog and mixed-mode circuit design, low voltage low power analog circuits (CMOS and BiCMOS) and digital circuits, current mode circuits, nano-technology circuits design, FPGA, FPAA and Implementation of DES and AES Algorithms on FPGA.

View full text

CFPA: Congestion aware, fault tolerant and process variation aware adaptive routing algorithm for asynchronous Networks-on-Chip

Highlights

Abstract

Introduction

Section snippets

Related work

Throughput and delay models

CFPA algorithm description

CFPA implementation

Lite CFPA algorithm (CFPAL)

Fault tolerance feature

Simulation results

Conclusions

Acknowledgment

Microprocess. Microsyst. B

Microelectron. J.

ACM Trans. Architecture Code Optim. (TACO)

J. Parallel Distrib. Comput.

Improving networks-on-chip performability: A topology-based approach

Intl. J. Circuit Theory Appl.

Analysis and Design of Networks-on-Chip Under High Process Variation

Process variation delay and congestion aware routing algorithm for asynchronous NoC design

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Process variation delay and congestion aware routing algorithm for asynchronous NoC design

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router

IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip

IEEE Trans. Comput.

Basic Communication Operation, in Introduction To Parallel Computing

An explicit approach for delay evaluation for on-chip RC interconnects using beta distribution function by moment matching technique

Proc. Intl. Conf. Recent Trends Inf. Telecommun. Comput.

Lite CFPA algorithm (CFPA $_{L}$ )