EDXY – A low cost congestion-aware routing algorithm for network-on-chips

https://doi.org/10.1016/j.sysarc.2010.05.002Get rights and content

Abstract

In this paper, an adaptive routing algorithm for two-dimensional mesh network-on-chips (NoCs) is presented. The algorithm, which is based on Dynamic XY (DyXY), is called Enhanced Dynamic XY (EDXY). It is congestion-aware and more link failure tolerant compared to the DyXY algorithm. On contrary to the DyXY algorithm, it can avoid the congestion when routing from the current switch to the destination whose X position (Y position) is exactly one unit apart from the switch X position (Y position). This is achieved by adding two congestion wires (one in each direction) between each two cores which indicate the existence of congestion in a row (column). The same wires may be used to alarm a link failure in a row (column). These signals enable the routing algorithm to avoid these paths when there are other paths between the source and destination pair. To assess the latency of the proposed algorithm, uniform, transpose, hotspot, and realistic traffic profiles for packet injection are used. The simulation results reveal that EDXY can achieve lower latency compared to those of other adaptive routing algorithms across all workloads examined, with a 20% average and 30% maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP. The area of the technique is about the same as those of the other routing algorithms.

Introduction

Recently on-chip transistor density has increased enabling the integration of dozens of intellectual property cores on a single die to form system-on-chips (SoCs). One byproducts of the greater integration is that for communication in these systems, shared buses should be replaced by interconnection networks. The network-on-chips (NoCs) has been proposed as a new paradigm for realizing complex SoCs [1], [2]. NoCs scale better than traditional forms of on-chip interconnections and have better performance and fault tolerant characteristics [2]. Among different possible topologies, the two-dimensional mesh is one of the most common topologies [3], [4].

In NoCs, routing algorithms are used to determine the path of a packet from the source to the destination. These algorithms are classified as deterministic and adaptive. The implementations of deterministic routing algorithms are simple but they are not able to balance the load across the links in non-uniform or bursty traffic [5], [6]. Adaptive routing algorithms are proposed to address these limitations. By better distributing load across links, adaptive algorithms improve network performance and also provide tolerance if link or router failure occurs. In adaptive routing algorithms, the path of a packet from the source to the destination is determined by the network condition. An adaptive routing algorithm decreases the probability of passing a packet from a congested or mal-function link. Despite its implementation complexity, adaptive routing is attractive for large NoCs especially when these NoCs facing with non-uniform or bursty traffic.

There are a number of routing algorithms which we briefly review those related to the algorithm proposed in this work. In [7], a static routing algorithm for two-dimensional meshes which is called XY is introduced. In this routing algorithm, each packet first travels along the X and then the Y direction to reach the destination. For this method, deadlock never occurs but no adaptivity exists in this algorithm. An adaptive routing algorithm named turn-model is introduced in [8] based on which another adaptive routing algorithm called Odd–Even turn is proposed in [9]. To avoid deadlock, Odd–Even method restricts the position that turns are allowed in the mesh topology. Another algorithm called DyAD is introduced in [10]. This algorithm is a combination of a static routing algorithm called oe-fix, and an adaptive routing algorithm based on the Odd–Even turn algorithm. Depending on the congestion condition of the network, one of the routing algorithms is invoked. Another adaptive routing is hot potato or deflection routing [11], [12] which is based on the idea of delivering a packet to an output channel at each cycle. If all the channels belonging to minimal paths are occupied, then the packet is misrouted. When contention occurs and the desired channel is not available, the packet, instead of waiting, will pick any alternative available channels (minimal or non-minimal) to continue moving to the next router; therefore the router does not need buffers. In hot potato routing, if the number of input channels is equal to the number of output channels at every router node, packets can always find an exit channel and they are deadlock free. However, livelock is a potential problem in this routing. Also, hot potato increases message latency even in the absence of congestion and bandwidth consumption. Accordingly, performance of hot potato routing is not as good as other wormhole routing methods [13]. Also, there are adaptive routings for increasing fault tolerance of the on-chip network. Stochastic communication method has been proposed to deal with permanent and transient faults of network links and nodes [14]. This method has the advantage of simplicity and low overhead. The selection of links and of the number of redundant copies to be sent on the links is stochastically done at runtime by the network routers. As a result, the transmission latency is unpredictable and, hence, it cannot be guaranteed. Also, stochastic communication is not efficient in terms of power dissipation and latency.

An adaptive deadlock free routing algorithm called Dynamic XY (DyXY) has been proposed in [15]. In this algorithm, which is based on the static XY algorithm, a packet is sent either to the X or Y direction depending on the congestion condition. It uses local information which is the current queue length of the corresponding input port in the neighboring routers to decide on the next hop. It is assumed that the collection of these local decisions should lead to a near-optimal path from the source to the destination. The main weakness of DyXY is that the use of the local information in making routing decision could forward the packet in a path which has congestion in the routers farther than the current neighbors. This situation could happen when the routing unit is one unit apart from the destination in X or Y dimension. Such non-optimal routing decisions will cause NoC to face with increasing in its network latency. The technique described in [16], may overcome this problem. It uses global information in making a routing decision. The technique requires a mechanism to mix local and global congestion information. This has been obtained at the cost of higher hardware overhead.

In this paper, we propose a technique for solving the problem of the DyXY routing algorithm with little area overhead. In addition, the proposed technique increases tolerance against single link failure compared to the DyXY technique. The rest of this paper is organized as follows: Section 2 describes the basic structure of the XY and DyXY routers. Section 3 describes the proposed routing algorithm and its architecture. Single link failure tolerances of the routing algorithm are compared in Section 4 while experimental results are discussed in Section 5. Finally, the conclusion of the paper is given in Section 6.

Section snippets

XY and DyXY routing mechanisms and their limitations

This section describes XY, and DyXY NoC routings and their main limitations.

EDXY routing solution

The objective of the EDXY routing algorithm is to avoid the problem of the DyXY algorithm. This is achieved by using a flag which indicates congestion along the path of a row (or column). This flag propagates in a row (or column) and indicates to the adjacent rows (or columns) that this row (or column) is near saturation and should be avoided. Since congestion flag should propagate along a row (or column), each switch transparently propagates its prior switch congestion flag. Also, each router

Link failure tolerance

The extra wires added to the NoC can be used to empower EDXY to tolerate single link failure. In fact these wires behave as congestion flags in normal conditions and are used to decrease latency of the routing algorithm, while in faulty condition, the role of these wires change and they are used to empower EDXY to route all packets to the destinations. With two modifications to the EDXY routing algorithm, this algorithm can tolerate single link failure (unidirectional and bidirectional) in the

Experimental results

For assessing the efficiency of the proposed routing algorithm, three other routing algorithms were also implemented. These algorithms included the XY, Odd–Even turn-model, and DyXY. A detailed VHDL code for the virtual-channel routers was written and simulations were carried out to determine their latency-throughput characteristics. For all the switches, the data width was set to 32-bits. Each input virtual channel had a buffer (FIFO) with the size of six flits. The congestion threshold value

Conclusions

In this paper, an enhanced dynamic routing algorithm, called EDXY, was proposed. It is congestion-aware and more link failure tolerant compared to the DyXY routing technique. The algorithm improved the DyXY routing algorithm. In this technique, two congestion wires were added to the router architecture to flag the row or column congestion further away from the current switch. This enabled avoiding the congested path, and thus decreasing the latency of the algorithm. The same wires were used to

Pejman Lotfi-Kamran received his B.Sc. and M.Sc. degrees in computer engineering from University of Tehran in 2002 and 2005, respectively. His research interest includes various aspects of computer architecture including multi-core architectures, power efficient architectures, service-oriented architectures, and interconnection network. He published dozens of papers in prestigious journals and conferences. Pejman is a student member of IEEE and ACM.

References (22)

  • L. Benini et al.

    Networks on chips: a new SoC paradigm

    IEEE Computer

    (2002)
  • W.J. Dally, B. Towles, Route packets, not wires: on-chip interconnection networks, in: Proceedings of the Design...
  • K. Sankaralingam, R. Nagarajan, P. Gratz, R. Desikan, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S....
  • S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S....
  • D. Bertsekas et al.

    Data Networks

    (1992)
  • W.J. Dally et al.

    Principles and Practices of Interconnection Networks

    (2004)
  • Intel Corporation, A touchstone delta system description, in: Intel Advanced Information,...
  • C.J. Glass et al.

    The turn model for adaptive routing

    Journal of the ACM

    (1994)
  • G.M. Chiu

    The odd–even turn model for adaptive routing

    IEEE Transactions on Parallel and Distributed Systems

    (2000)
  • J.C. Hu, R. Marculescu, DyAD – smart routing for networks-on-chip, in: Proceedings of the Design Automation Conference,...
  • E. Nilsson, M. Millberg, J. Öberg, A. Jantsch, Load distribution with the proximity congestion awareness in a network...
  • Cited by (76)

    • A power-optimized, area-efficient implementation of Connection-Then-Credit NoC physical layer

      2017, Microelectronics Journal
      Citation Excerpt :

      One of the most common approaches to balance data transfer over the network is using congestion-aware adaptive routing algorithms [25]. Lotfi-Kamran et al. proposed an extension to the dynamic XY routing algorithm [26]. The proposed algorithm, called enhanced dynamic XY (EDXY), allocates two wires to only share congestion information in every direction with all neighbor nodes.

    • Dynamic routing algorithm to normalize the routers utilization in mesh based NoC

      2023, 2023 11th International Symposium on Electronic Systems Devices and Computing, ESDC 2023
    View all citing articles on Scopus

    Pejman Lotfi-Kamran received his B.Sc. and M.Sc. degrees in computer engineering from University of Tehran in 2002 and 2005, respectively. His research interest includes various aspects of computer architecture including multi-core architectures, power efficient architectures, service-oriented architectures, and interconnection network. He published dozens of papers in prestigious journals and conferences. Pejman is a student member of IEEE and ACM.

    Amir-Mohammad Rahmani received B.S. degree from Mashhad Branch, Azad University, Iran, in 2006, and M.S. degree from the University of Tehran, Tehran, Iran, in 2009, both in Computer Engineering. He is currently pursuing his Ph.D. in Computer Systems Laboratory, University of Turku, Finland. His research interests include Low-Power Design, Network-on-chips, Multi-Processor System-on-chip, and 3D ICs.

    Masoud Daneshtalab received his Master’s degree in computer architecture from School of Electrical and Computer Engineering, University of Tehran in 2006. Since autumn 2008 he has been working in the Computer Systems laboratory, University of Turku and from May 2009 he is a doctoral candidate of Graduate School in Electronics, Telecommunications and Automation (GETA). He is expected to get his PhD degree in Jan 2011. He has expertise in on/off-chip interconnection networks, multiprocessor architectures, network-on-chips (NoC), and low-power digital design. His PhD thesis is focused on topology formation and routing protocol in 2-D and 3-D On-chip Networks. Masoud is a member of IEEE and has published more than 40 refereed international journals and conference papers.

    Ali Afzali-Kusha (SM’ 06) received his B.Sc., M.Sc., and Ph.D. degrees in Electrical Engineering from Sharif University of Technology, the University of Pittsburgh, and the University of Michigan in 1988, 1991, and 1994, respectively. From 1994 to 1995, he was a Post-Doctoral Fellow at the University of Michigan. Since 1995, he has been with the University of Tehran, where he is currently a Professor in the School of Electrical and Computer Engineering and the Director of the Low-Power High-Performance Nanosystems Laboratory. Also, while on a research leave from the University of Tehran, he was a Research Fellow at the University of Toronto and the University of Waterloo in 1998 and 1999, respectively. His current research interests include low-power high-performance design methodologies from the physical design level to the system level for the nanoelectronics era. Dr. Afzali-Kusha is a senior member of IEEE.

    Zainalabedin Navabi, Ph.D., is professor of electrical and computer engineering at University of Tehran. Dr. Navabi has worked in the design, definition, and implementation of hardware description languages and the synthesis and testing of digital systems. He has developed and supervised the development of many HDL-related software packages and tools, and has directed projects in VLSI design, test synthesis, simulation, synthesis, and other aspects of digital system design automation. Dr. Navabi is a member of ACM, IEEE, and IEEE Computer Society, and is an active participant in the IEEE DASC committee that sets standards related to hardware description languages.

    View full text