CFPA: Congestion aware, fault tolerant and process variation aware adaptive routing algorithm for asynchronous Networks-on-Chip

https://doi.org/10.1016/j.jpdc.2019.03.001Get rights and content

Highlights

  • A new routing algorithm (called CFPA) for networks-on-chip is proposed.

  • CFPA is congestion-aware, process-variation-delay-aware and fault-tolerant.

  • CFPA provides significant improvement in throughput, delay and fault-tolerance.

Abstract

Delays caused by congestion, faults and process variation (PV) degrade networks-on-chip (NoC) performance. A congestion aware, fault tolerant and process variation aware adaptive routing algorithm (CFPA) is introduced for congested and faulty asynchronous NoCs. The proposed routing algorithm maintains two routing tables to determine the packet path: one for routing directions based on propagation delay (including PV delay), and the other to keep track of the queuing delays at each router port. The queuing delay is used as an indication for congestion. The proposed routing tables store multiple paths to every destination via all polar directions, which makes CFPA a fault tolerant algorithm in case of path failures. The proposed algorithm is verified against other popular routing algorithms for NoCs with different topologies and network dimensions. On average, CFPA enhances the NoC throughput by 60% compared to the recently proposed routing algorithms. With CFPA, the impact of faults on NoC throughput is alleviated by 48%. In addition, the average delay of messages routed using CFPA is shorter than that of other algorithms by (2675)% under process variation conditions. Furthermore, the proposed algorithm minimizes the impact of PV on NoC throughput to less than 5% of the nominal throughput for mesh topology.

Introduction

Recent evolution in digital electronics made integrating System-on-Chip (SoC) a reality. SoC allows the designers to integrate a large number of processing elements (PEs) into a single chip [38]. In SoC, PEs are interconnected via bus interconnections. As SoC becomes more complex, designers face scalability and power dissipation issues. Network-on-Chip (NoC) is proposed as a solution for the interconnection limitation. In the NoC paradigm, bus-control logic of the bus interconnects is replaced with routers. Routers make NoCs smarter and more scalable.

Similar to computer networks, NoC employs routing algorithms to forward data flits (packet segment name in NoC) among routers to their respective destinations. Using adaptive routing improves the NoC performance and provides tolerance to link and node failures. Despite the overhead and complexity in implementation, adaptive algorithms are preferable in NoCs due to their superior throughput and delay performance. A key role of the routing algorithm is to avoid congestion occurrence and to balance the traffic load among the links in the overall network [23]. In addition, fault tolerance is needed to avoid deadlocks and data loss.

Manufacturing variations (i.e. threshold voltage and gate length variations) cause deviation of process parameters from their nominal values, which is known as process variation (PV). With technology scaling down, PV becomes more severe. However, asynchronous design has better tolerance to PV with technology scaling. PV effects could be mitigated using advanced routing techniques. Communication channels connecting network nodes have interconnect delays that contribute to packet propagation delay, and hence negatively affect whole NoC performance [18]. Many techniques are proposed to mitigate the impact of PV [22], [39]. Most of the proposed techniques ignore the routing mechanism which is strongly affected by the PV delays and resulted congestion.

The majority of computer network routing protocols are categorized into two main categories; link state (LS) routing and distance vector (DV) [30]. To the best of our knowledge, neither DV nor LS algorithms are proposed for asynchronous NoCs. Customized versions of LS and DV protocols are proposed in[1], [35] to mitigate the communication complexity and overhead of synchronous NoCs. Customized LS exchanges only updated link status with no frequent broadcasting data to be sent over the network. An asynchronous router is designed for time division multiplexed NoC as proposed in [20]. In [29], an asynchronous NoC router employing an adaptive algorithm for routing is proposed to support communication with the synchronous processing element (PE). Source routing dependent asynchronous NoC design is introduced in [33]. The work proposed in [33] presented timing organization analysis for Globally Asynchronous Locally Synchronous (GALS) and asynchronous architectures [17]. In [41], a reconfigurable routing is proposed for faults handling in 2-D NoC with noticeable area overhead. With the advance of today’s fabrication technology, considering router ports queuing delay and process variation impact on asynchronous NoC performance become essential design needs. Not all of the mentioned algorithms take care of these two essential performance limiting factors.

Fig. 1 depicts a NoC consisting of nine cores. In particular, nine PEs are interconnected via routers using a mesh topology, where every PE is associated with its router. Based on the asynchronous router architecture proposed in [6], the proposed routing algorithm is to be deployed into the routing unit of the NoC router. Handshaking protocol signals are used to synchronize messages transmission among routers [28]. Flits traveling from their sources to their respective destinations experience a propagation delay DP and a queuing delay DQ. The role of the routing algorithm is to select the best path for data flits to minimize the impact of congestion and PV on DP and DQ, moreover tolerate runtime faults.

The main focus of this paper is to introduce a novel congestion aware, fault tolerant, and process variation aware routing algorithm (CFPA) for asynchronous NoC. In addition, a full analysis of the different delay components which the algorithm deals with is targeted. A detailed routing message model is also provided. The routing overhead analysis is presented. CFPA minimizes the process variation impact on the throughput to ignorable values (typically less than 5% of the nominal values). Moreover, the message delay is reduced to noticeable levels compared to other tested routing algorithms. The proposed algorithm keeps the throughput degradation less than 20% for a faulty NoC, whereas other algorithms lose 60% of their throughput due to link and node failures.

The rest of this paper is organized as follows. Related work is discussed in Section 2. In Section 3, the throughput and delay models are introduced. The proposed CFPA algorithm is presented in Section 4. The implementation of CFPA is provided in Section 5. A lite-weight version of CFPA is introduced in Section 6. The fault tolerance feature is illustrated in Section 7. Simulation results are provided in Section 8. Finally, some conclusions are discussed in Section 9.

Section snippets

Related work

NoC performance is strongly affected by the used topology [4] and the employed routing algorithm [40]. XY, Odd–Even (OE)[40], [42] and Dynamic Adaptive and Deterministic (DyAD) [15] are widely used algorithms in NoCs. In XY, flits are routed in the x direction to the destination column, then vertically in the y direction to the destination row. In OE routing, eight 90-degree turns (i.e. North-East…) are available for routing. Some turns are permitted, and some are not permitted for NoC columns,

Throughput and delay models

The NoC performance is characterized by its throughput and delay. The introduced throughput and delay models are based on the models proposed in [28]. In Section 3.1 the throughput variation model is presented. The delay model is introduced in Section 3.2.

CFPA algorithm description

The proposed algorithm is congestion-aware, fault tolerant, process-variation-aware and adaptive. Therefore, it is referred to as CFPA. Note also that a modified reduced complexity (lite) version of the algorithm (achieving better performance and scalability) is presented in Section 6.

The number of ports which are connected to a router vary according to the used network topology and the location of the router within the topology. The assumed router architecture used has 5 ports from North (N),

CFPA implementation

CFPA depends on exchanging a set of routing messages to maintain its tables. Propagation delay message (PD_MSG) is used to exchange propagation delay information among routers. The queuing delay message (QD_MSG) carries queuing delay information throughout the network. PD_MSG and QD_MSG are discussed in more detail later in Sections 5.2 Propagation delay handling, 5.3 Queuing delay handling, respectively. In addition, CFPA uses still-alive message (SL_MSG), which carries no information except

Lite CFPA algorithm (CFPAL)

The queuing delay calculation from source to destination is the primary contribution for CFPA algorithm. However, the periodic all-to-all broadcasting of QD_MSGs may add considerable overhead. Moreover, extra per-node memory needs (to store QD_T entries) and computational complexity issues arise with scaling the NoC. A lite version of CFPA is proposed in this section to reduce the overhead and achieve more scalable algorithm for larger NoCs, with sub-optimum routing path selected for the sake

Fault tolerance feature

As discussed in Section 4, CFPA algorithm provides up to four alternative routes to every destination. Each router sends SL_MSG in each SL_UINT duration to directly connected neighbors to inform them it is still alive. Furthermore, each router times out the connection to a direct neighbor when it receives no SL_MSG from that neighbor. Hence, when the timer reaches TTR, it is implied that the link to this neighbor is broken. Reporting failure of a route to a node is achieved via sending PD_MSG

Simulation results

A PV-enabled Heterogeneous Network-on-Chip Simulator (HNOCS) [2], [6] is used to evaluate the proposed routing algorithm. HNOCS is an OMNeT++  [36] based simulator. ADS tools and Monte Carlo (MC) simulator are used at the circuit level (32 nm manufacturing technology) to estimate the propagation delay components (Eqs. (6), (7)). the values of Din, Don, Dipv, Dopv and Dintpv [6] are randomly picked from MC iterations output values and assigned to different ports and channels of the tested

Conclusions

Congestion Aware, Fault Tolerant, and Process Variation Aware Adaptive Algorithm (CFPA) is proposed for NoCs. Two versions of CFPA are proposed. A lite version (referred to as CFPAL) has less overhead and computational complexity.

CFPA relies on maintaining a propagation delay (including PV delay) routing table as well as queuing delay table with periodic data exchange to select the shortest route. Choosing the best path depends on both queuing and propagation delays. Considering queuing delay

Acknowledgment

The second and third authors were supported in part by the Distributed and Networked Systems Research Group Operating, United Arab Emirates Grant No. 150410, University of Sharjah.

Sayed Taha Muhammad received the B.S. (Hons.) degree in communication and electronics engineering from Cairo University, Egypt, in 2004, the Software Engineering Diploma degree from the Information Technology Institute (ITI), Cairo, Egypt, in 2008, and the M.Sc. degree in electronics engineering from Fayoum University, in 2014. Sayed received Ph.D. degree in electrical and computers engineering from Minia University, Minia, Egypt, in 2018. He joined the department of Electrical Engineering,

References (42)

  • Ezz-EldinR. et al.

    Process variation delay and congestion aware routing algorithm for asynchronous NoC design

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2015)
  • FengC. et al.

    Addressing transient and permanent faults in NoC with efficient fault-tolerant deflection router

    IEEE Trans. Very Large Scale Integr. (VLSI) Syst.

    (2013)
  • GiuseppeA. et al.

    Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip

    IEEE Trans. Comput.

    (2008)
  • GramaA. et al.

    Basic Communication Operation, in Introduction To Parallel Computing

    (2003)
  • P. Gratz, B. Grot, W. Keckler, Regional congestion awareness for load balance in networks-on-chip, in: The Proceedings...
  • http://www.itrs.net/Links/2013ITRS/Home2013.htm (last visited: January...
  • J. Hu, R. Marculescu, DyAD: Smart routing for networks-on-chip, in: The Proceedings of the Annual Design Automation...
  • D. Jindun, J. Xin, L. Renjie, W. Takahiro, An efficient deadlock-free adaptive routing algorithm for 3D...
  • A. Johannes, N. Kucza, M. Vohrmann, T. Jungeblut, M. Porrmann, U. Rückert, Comparing synchronous, mesochronous and...
  • KarR. et al.

    An explicit approach for delay evaluation for on-chip RC interconnects using beta distribution function by moment matching technique

    Proc. Intl. Conf. Recent Trends Inf. Telecommun. Comput.

    (2010)
  • A. Kothari, D. Patel, Methodology to solve the count-to-infinity problem by accepting and forwarding correct and...
  • Cited by (13)

    • Design and implementation of congestion aware router for network-on-chip

      2023, Integration
      Citation Excerpt :

      Masoud et al. [13] have proposed a re configurable NoC architecture with high performance and low power consumption. Congestion awareness along with Fault Tolerance and process variation awareness on an adaptive routing for asynchronous NoC is shown in [3]. Congestion prediction algorithms could be used for a more balanced traffic distribution to improve the throughput and speed [14].

    • Multi-objective biogeography-based optimization and reinforcement learning hybridization for network-on chip reliability improvement

      2022, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      The aim is to improve the main design goals of the NoC, such as latency, area, bandwidth, power consumption, communication efficiency, etc. Reliability is a key factor in the design of NoCs [80]. These are subject to different types of faults, which are classified in [32]] into three types: intermittent, transient and permanent.

    View all citing articles on Scopus

    Sayed Taha Muhammad received the B.S. (Hons.) degree in communication and electronics engineering from Cairo University, Egypt, in 2004, the Software Engineering Diploma degree from the Information Technology Institute (ITI), Cairo, Egypt, in 2008, and the M.Sc. degree in electronics engineering from Fayoum University, in 2014. Sayed received Ph.D. degree in electrical and computers engineering from Minia University, Minia, Egypt, in 2018. He joined the department of Electrical Engineering, Beni-Suef University, Beni-suef, Egypt, in 2009, as a demonstrator, and he became an Assistant Lecturer, in 2014. Sayed joined Computers and systems engineering department at Fayoum university, Fayoum, Egypt in 2017. He currently works on his post Ph.D researches. Sayed recently published in IEEE IDT conference, JPDC, TVLSI, IEEE ICEAC conference and one Springer book chapter. His current research interests include networks-on-a-chip (NoC) power dissipation, process variation in NoC, routing algorithms, Software engineering, and computer architecture.

    Mohamed Saad (Senior Member, IEEE) received Ph.D. degree in electrical and computer engineering from McMaster University, Hamilton, Canada, in 2004. He is currently an Associate Professor at the Department of Electrical and Computer Engineering, University of Sharjah, UAE. His research interests include networking, communications and optimization, with current activity focused on the optimal design of wireless and wired communication networks, and optimal network resource management. He has also held research positions with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada, and the Advanced Optimization Laboratory at the Department of Computing and Software, McMaster University, Hamilton, Canada. Dr. Saad is an editor for the International Journal of Distributed Sensor Networks. He was the recipient of the best paper award in the IEEE Symposium on Computers and Communications, Riccione, Italy, June 2010. He was the recipient of the University of Sharjah “Annual Incentive Award for Distinguished Faculty Members”, for excellence in research, April 2010 (university-wide). He received also two best teaching awards by the IEEE Women in Engineering Society, University of Sharjah (in 2007 and 2009). He was also the recipient of a 2005–2006 Natural Sciences and Engineering Research Council of Canada (NSERC) post-doctoral fellowship. He is a senior member of the IEEE.

    Ali A. El-Moursy received the B.S. (Hons.) degree in electronics and communications engineering and the master’s degree in computer engineering from Cairo University, Cairo, Egypt, in 1996 and 2000, respectively, and the master’s degree in electrical engineering and the Ph.D. degree in high performance computer architecture from University of Rochester, Rochester, NY, USA, in 2001 and 2005, respectively. He was with the Software Solution Group, Intel Corporation, Santa Clara, CA, USA, till 2007. He joined the Electronics Research Institute, Giza, Egypt, in 2007. His current research interests include high-performance computer architecture, multicore multithreaded micro-architecture, power-aware micro-architecture, simulation and modeling of architecture performance and power, workload profiling and characterization, cell programming, high performance computing, parallel computing, and cloud computing. He has also participated with the IBM Cairo Technology Development Center, Giza, Egypt, as a Visitor Research Scientist, in many Cell BE and Blue Gene projects and research activities from 2007 to 2010. He was with GRD Egypt DVT Modelsim Team, Mentor Graphics Egypt, Cairo, Egypt, as a Senior Development Engineer, from 2010 to 2011. He joined the Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates, in 2010, as an Assistant Professor.

    Magdy A. El-Moursy was born in Cairo, Egypt in 1974. He received the B.S. degree in electronics and communications engineering (with honors) and the Master’s degree in computer networks from Cairo University, Cairo, Egypt, in 1996 and 2000, respectively, and the Master’s and the Ph.D. degrees in electrical engineering in the area of high-performance VLSI/IC design from University of Rochester, Rochester, NY, USA, in 2002 and 2004, respectively. In summer of 2003, he was with STMicroelectronics, Advanced System Technology, San-Diego, CA, USA. Between September 2004 and September 2006, he was a Senior Design Engineer at Portland Technology Development, Intel Corporation, Hillsboro, OR, USA. During September 2006 and February 2008, he was assistant professor in the Information Engineering and Technology Department of the German University in Cairo (GUC), Cairo, Egypt. Between February 2008 and October 2010, he was Technical Lead in the Mentor Hardware Emulation Division, Mentor Graphics Corporation, Cairo, Egypt. Dr. El-Moursy is currently Staff Engineer in Design Creation and Synthesis Division, Mentor Graphics Corporation, and Associate Professor in the Microelectronics Department, Electronics Research Institute, Cairo, Egypt. He is Associate Editor in the Editorial Board of Elsevier Microelectronics Journal, International Journal of Circuits and Architecture Design and Journal of Circuits, Systems, and Computers and Technical Program Committee of many IEEE Conferences such as ISCAS, ICM, ICAINA, PacRim CCCSP, ISESD, SIECPC, and IDT. His research interest is in Networks-on-Chip/System-on-Chip, interconnect design and related circuit level issues in high performance VLSI circuits, clock distribution network design, digital ASIC circuit design, VLSI/SoC/NoC design and validation/verification, circuit verification and testing, Embedded Systems and low power design. He is the author of 70 papers, five book chapters, and three books in the fields of high speed and low power CMOS design techniques and NoC/SoC.

    Hesham F. A. Hamed was born in Giza, Egypt, in 1966. He received the B.Sc. degree in electrical engineering, and M.Sc. degree in electronics and communications engineering from Minia University, EL-Minia, Egypt, in 1989, and 1993, respectively. He received Ph.D. degree in electronics and communications engineering from Texas A&M University, College Station, Texas, USA, and Minia University (as Channel system between the two universities) in 1997. He currently is the dean of faculty of Engineering, Minia University, EL-Minia, from 1989 to 1993 he worked as a Teacher Assistant in the Electrical Engineering Department, Minia University. From 1993 to 1995 he was a visiting scholar at Cairo University, Cairo, Egypt. From 1995 to 1997 he was a visiting scholar at Texas A&M University, College Station, Texas (with the group of VLSI). From 1997 to 2003 he was an Assistant Professor in the Electrical Engineering Department, Minia University. From 2003 to 2005 he was Associate Professor in Alkharj College of technology, Alkharj, KSA. From 2005 to 2007 he was a Visiting Professor at Ohio University; Athens, Ohio, USA. He has published more than 70 papers, one book and three book chapters. His research interests include analog and mixed-mode circuit design, low voltage low power analog circuits (CMOS and BiCMOS) and digital circuits, current mode circuits, nano-technology circuits design, FPGA, FPAA and Implementation of DES and AES Algorithms on FPGA.

    View full text