EDXY – A low cost congestion-aware routing algorithm for network-on-chips

doi:10.1016/j.sysarc.2010.05.002

Journal of Systems Architecture

Volume 56, Issue 7, July 2010, Pages 256-264

https://doi.org/10.1016/j.sysarc.2010.05.002 Get rights and content

Abstract

In this paper, an adaptive routing algorithm for two-dimensional mesh network-on-chips (NoCs) is presented. The algorithm, which is based on Dynamic XY (DyXY), is called Enhanced Dynamic XY (EDXY). It is congestion-aware and more link failure tolerant compared to the DyXY algorithm. On contrary to the DyXY algorithm, it can avoid the congestion when routing from the current switch to the destination whose X position (Y position) is exactly one unit apart from the switch X position (Y position). This is achieved by adding two congestion wires (one in each direction) between each two cores which indicate the existence of congestion in a row (column). The same wires may be used to alarm a link failure in a row (column). These signals enable the routing algorithm to avoid these paths when there are other paths between the source and destination pair. To assess the latency of the proposed algorithm, uniform, transpose, hotspot, and realistic traffic profiles for packet injection are used. The simulation results reveal that EDXY can achieve lower latency compared to those of other adaptive routing algorithms across all workloads examined, with a 20% average and 30% maximum latency reduction on SPLASH-2 benchmarks running on a 49-core CMP. The area of the technique is about the same as those of the other routing algorithms.

Introduction

Recently on-chip transistor density has increased enabling the integration of dozens of intellectual property cores on a single die to form system-on-chips (SoCs). One byproducts of the greater integration is that for communication in these systems, shared buses should be replaced by interconnection networks. The network-on-chips (NoCs) has been proposed as a new paradigm for realizing complex SoCs [1], [2]. NoCs scale better than traditional forms of on-chip interconnections and have better performance and fault tolerant characteristics [2]. Among different possible topologies, the two-dimensional mesh is one of the most common topologies [3], [4].

In NoCs, routing algorithms are used to determine the path of a packet from the source to the destination. These algorithms are classified as deterministic and adaptive. The implementations of deterministic routing algorithms are simple but they are not able to balance the load across the links in non-uniform or bursty traffic [5], [6]. Adaptive routing algorithms are proposed to address these limitations. By better distributing load across links, adaptive algorithms improve network performance and also provide tolerance if link or router failure occurs. In adaptive routing algorithms, the path of a packet from the source to the destination is determined by the network condition. An adaptive routing algorithm decreases the probability of passing a packet from a congested or mal-function link. Despite its implementation complexity, adaptive routing is attractive for large NoCs especially when these NoCs facing with non-uniform or bursty traffic.

There are a number of routing algorithms which we briefly review those related to the algorithm proposed in this work. In [7], a static routing algorithm for two-dimensional meshes which is called XY is introduced. In this routing algorithm, each packet first travels along the X and then the Y direction to reach the destination. For this method, deadlock never occurs but no adaptivity exists in this algorithm. An adaptive routing algorithm named turn-model is introduced in [8] based on which another adaptive routing algorithm called Odd–Even turn is proposed in [9]. To avoid deadlock, Odd–Even method restricts the position that turns are allowed in the mesh topology. Another algorithm called DyAD is introduced in [10]. This algorithm is a combination of a static routing algorithm called oe-fix, and an adaptive routing algorithm based on the Odd–Even turn algorithm. Depending on the congestion condition of the network, one of the routing algorithms is invoked. Another adaptive routing is hot potato or deflection routing [11], [12] which is based on the idea of delivering a packet to an output channel at each cycle. If all the channels belonging to minimal paths are occupied, then the packet is misrouted. When contention occurs and the desired channel is not available, the packet, instead of waiting, will pick any alternative available channels (minimal or non-minimal) to continue moving to the next router; therefore the router does not need buffers. In hot potato routing, if the number of input channels is equal to the number of output channels at every router node, packets can always find an exit channel and they are deadlock free. However, livelock is a potential problem in this routing. Also, hot potato increases message latency even in the absence of congestion and bandwidth consumption. Accordingly, performance of hot potato routing is not as good as other wormhole routing methods [13]. Also, there are adaptive routings for increasing fault tolerance of the on-chip network. Stochastic communication method has been proposed to deal with permanent and transient faults of network links and nodes [14]. This method has the advantage of simplicity and low overhead. The selection of links and of the number of redundant copies to be sent on the links is stochastically done at runtime by the network routers. As a result, the transmission latency is unpredictable and, hence, it cannot be guaranteed. Also, stochastic communication is not efficient in terms of power dissipation and latency.

An adaptive deadlock free routing algorithm called Dynamic XY (DyXY) has been proposed in [15]. In this algorithm, which is based on the static XY algorithm, a packet is sent either to the X or Y direction depending on the congestion condition. It uses local information which is the current queue length of the corresponding input port in the neighboring routers to decide on the next hop. It is assumed that the collection of these local decisions should lead to a near-optimal path from the source to the destination. The main weakness of DyXY is that the use of the local information in making routing decision could forward the packet in a path which has congestion in the routers farther than the current neighbors. This situation could happen when the routing unit is one unit apart from the destination in X or Y dimension. Such non-optimal routing decisions will cause NoC to face with increasing in its network latency. The technique described in [16], may overcome this problem. It uses global information in making a routing decision. The technique requires a mechanism to mix local and global congestion information. This has been obtained at the cost of higher hardware overhead.

In this paper, we propose a technique for solving the problem of the DyXY routing algorithm with little area overhead. In addition, the proposed technique increases tolerance against single link failure compared to the DyXY technique. The rest of this paper is organized as follows: Section 2 describes the basic structure of the XY and DyXY routers. Section 3 describes the proposed routing algorithm and its architecture. Single link failure tolerances of the routing algorithm are compared in Section 4 while experimental results are discussed in Section 5. Finally, the conclusion of the paper is given in Section 6.

Section snippets

XY and DyXY routing mechanisms and their limitations

This section describes XY, and DyXY NoC routings and their main limitations.

EDXY routing solution

The objective of the EDXY routing algorithm is to avoid the problem of the DyXY algorithm. This is achieved by using a flag which indicates congestion along the path of a row (or column). This flag propagates in a row (or column) and indicates to the adjacent rows (or columns) that this row (or column) is near saturation and should be avoided. Since congestion flag should propagate along a row (or column), each switch transparently propagates its prior switch congestion flag. Also, each router

Link failure tolerance

The extra wires added to the NoC can be used to empower EDXY to tolerate single link failure. In fact these wires behave as congestion flags in normal conditions and are used to decrease latency of the routing algorithm, while in faulty condition, the role of these wires change and they are used to empower EDXY to route all packets to the destinations. With two modifications to the EDXY routing algorithm, this algorithm can tolerate single link failure (unidirectional and bidirectional) in the

Experimental results

For assessing the efficiency of the proposed routing algorithm, three other routing algorithms were also implemented. These algorithms included the XY, Odd–Even turn-model, and DyXY. A detailed VHDL code for the virtual-channel routers was written and simulations were carried out to determine their latency-throughput characteristics. For all the switches, the data width was set to 32-bits. Each input virtual channel had a buffer (FIFO) with the size of six flits. The congestion threshold value

Conclusions

In this paper, an enhanced dynamic routing algorithm, called EDXY, was proposed. It is congestion-aware and more link failure tolerant compared to the DyXY routing technique. The algorithm improved the DyXY routing algorithm. In this technique, two congestion wires were added to the router architecture to flag the row or column congestion further away from the current switch. This enabled avoiding the congested path, and thus decreasing the latency of the algorithm. The same wires were used to

Pejman Lotfi-Kamran received his B.Sc. and M.Sc. degrees in computer engineering from University of Tehran in 2002 and 2005, respectively. His research interest includes various aspects of computer architecture including multi-core architectures, power efficient architectures, service-oriented architectures, and interconnection network. He published dozens of papers in prestigious journals and conferences. Pejman is a student member of IEEE and ACM.

References (22)

L. Benini et al.
Networks on chips: a new SoC paradigm
IEEE Computer
(2002)
W.J. Dally, B. Towles, Route packets, not wires: on-chip interconnection networks, in: Proceedings of the Design...
K. Sankaralingam, R. Nagarajan, P. Gratz, R. Desikan, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S....
S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S....
D. Bertsekas et al.
Data Networks
(1992)
W.J. Dally et al.
Principles and Practices of Interconnection Networks
(2004)
Intel Corporation, A touchstone delta system description, in: Intel Advanced Information,...
C.J. Glass et al.
The turn model for adaptive routing
Journal of the ACM
(1994)
G.M. Chiu
The odd–even turn model for adaptive routing
IEEE Transactions on Parallel and Distributed Systems
(2000)
J.C. Hu, R. Marculescu, DyAD – smart routing for networks-on-chip, in: Proceedings of the Design Automation Conference,...

E. Nilsson, M. Millberg, J. Öberg, A. Jantsch, Load distribution with the proximity congestion awareness in a network...

Cited by (76)

VCS: A method of in-order packet delivery for adaptive NoC routing
2021, Nano Communication Networks
Adaptive routing proves its efficiency in sustaining higher network performance bypassing packets through alternate congestion-free minimal or non-minimal routes for a many-core on-chip communication system. It prevents the network from reaching an early saturation due to the stalling of packets in the priority-fixed shortest route for a longer period. However, routing packets adaptively to an alternate route instead of choosing the priority-fixed and shortest route may lead to an unintended state of delivering packets out-of-order sequence at the destination node that is not accepted by many message-passing systems. This ordering mismatch is due to allowing multiple alternate routes for packets of the same communication flow in adaptive mode. In such a case, a packet with a higher sequence number takes over another packet of the same communication flow having a lower sequence number. Storing and reordering all unordered packets of the same communication flow at the sink side becomes costlier due to the increasing area and power requirement. Ideally, the added memory size reaches to infinity for storing all unordered packets that belong to the same communication flow. In the proposed method, we guarantee delivery of all packets belongs to the same communication flow in an orderly manner under adaptive routing mode. In particular, we have proposed a flow-control policy based on a virtual circuit switch (VCS) method that exclusively won and reserve a virtual path for routing packets in an adaptive mode for each communication flow. To maximize network performance, we allow sharing of the reserved path among packets of different communication flows, routing under priority fix deterministic mode. The method saves need of having additional memory unit for storing all unordered packets while guaranteeing the packet’s ordering sequence at the destination end. An experiment conducted on two different size Mesh networks (8x8 and 12x12) under several synthetic traffic and benchmark application reveals that our virtual circuit switching (VCS) based proposed method offers a significant improvement over the state-of-the-art packet reordering method (ROR) and two similar type of research works. Our simulation based experimental result shows that method offers a 60% higher saturation point and 21% improvements (maximum) in throughput while offering a reduction of 60% in NoC area and 58% power value compared to the baseline reorder (ROR) method.
Dark Silicon and the History of Computing
2018, Advances in Computers
For many years, computer designers benefitted from Moore's law and Dennard scaling to significantly improve the speed of single-core processors. The failure of Dennard scaling pushed the computer industry toward homogenous multicore processors for the performance improvement to continue without significant increase in power consumption. Unfortunately, even homogeneous multicore processors cannot offer the level of energy efficiency required to operate all the cores at the same time in today's and especially tomorrow's technologies. As a result of lack of energy efficiency, not all the cores in a multicore processor can be functional at the same time. This phenomenon is referred to as dark silicon. In this chapter, we go over the history of computing and review some of the major changes in microprocessors. Specifically, we articulate why dark silicon is inevitable and how the performance of processors can significantly be improved in the age of dark silicon.
A power-optimized, area-efficient implementation of Connection-Then-Credit NoC physical layer
2017, Microelectronics Journal
Citation Excerpt :
One of the most common approaches to balance data transfer over the network is using congestion-aware adaptive routing algorithms [25]. Lotfi-Kamran et al. proposed an extension to the dynamic XY routing algorithm [26]. The proposed algorithm, called enhanced dynamic XY (EDXY), allocates two wires to only share congestion information in every direction with all neighbor nodes.
This paper discusses the implementation details of a complete NoC physical layer, basically, the Networks-on-Chip (NoC) routers, links and interfaces. A cycle-accurate RTL design details of complete NoC using a mesh-topology is presented with a special attention to the design of the NoC interface. The proposed implementation provides a complete end-to-end solution for an NoC system in a modular architecture, along with its advanced verification environment, to serve as a development and test platform for future NoC research. The paper also analyzes the performance of the Connection-Then-Credit (CTC) protocol and compares it to the conventional Credit-based (CB) protocol using standard traffic patterns, as well as its post-synthesis implementation results using TSMC 40 nm low-power CMOS technology. Through the addition of a low-power controller to the CTC-based NoC interfaces, our experimental results show a significant efficiency improvement in terms of power-savings, latency and area overhead. The CTC implementation achieved 21.46% saving in power consumption for the VOPD benchmark. In terms of gate-count, the CTC implementation of VOPD, MPEG, and MWD benchmarks achieved 14.40%, 34.05%, and 7.44% less gate counts, respectively.
Improving the Area Efficiency of ACO-Based Routing by Directional Pheromone in Large-Scale NoCs
2016, Microprocessors and Microsystems
Ant Colony Optimization (ACO) is a distributed collective-intelligence algorithm. Several adaptive routing algorithms based on ACO have been proposed in the domain of Network-on-Chip (NoC) design for balancing traffic load. However, when network size becomes large, the conventional ACO requires quite a lot of pheromones for predicting network load distribution, which results in large hardware cost and low cost-efficiency. In this paper, an ACO algorithm with directional pheromone (ACO-DP) is proposed for reducing the size of pheromone table in large-scale networks. Moreover, by using a distance-sensitive backward pheromone updating scheme, the performance of ACO-DP is also improved. Finally, we introduce the detailed architecture and hardware implementation of ACO-DP routing. Experimental results show that ACO-DP routing achieves the highest area efficiency in large-scale NoC systems compared to other ACO-based routing algorithms.
Evaluation of the Routing Algorithms for NoC-Based MPSoC: A Fuzzy Multi-Criteria Decision-Making Approach
2023, IEEE Access
Dynamic routing algorithm to normalize the routers utilization in mesh based NoC
2023, 2023 11th International Symposium on Electronic Systems Devices and Computing, ESDC 2023

View all citing articles on Scopus

Amir-Mohammad Rahmani received B.S. degree from Mashhad Branch, Azad University, Iran, in 2006, and M.S. degree from the University of Tehran, Tehran, Iran, in 2009, both in Computer Engineering. He is currently pursuing his Ph.D. in Computer Systems Laboratory, University of Turku, Finland. His research interests include Low-Power Design, Network-on-chips, Multi-Processor System-on-chip, and 3D ICs.

Masoud Daneshtalab received his Master’s degree in computer architecture from School of Electrical and Computer Engineering, University of Tehran in 2006. Since autumn 2008 he has been working in the Computer Systems laboratory, University of Turku and from May 2009 he is a doctoral candidate of Graduate School in Electronics, Telecommunications and Automation (GETA). He is expected to get his PhD degree in Jan 2011. He has expertise in on/off-chip interconnection networks, multiprocessor architectures, network-on-chips (NoC), and low-power digital design. His PhD thesis is focused on topology formation and routing protocol in 2-D and 3-D On-chip Networks. Masoud is a member of IEEE and has published more than 40 refereed international journals and conference papers.

Ali Afzali-Kusha (SM’ 06) received his B.Sc., M.Sc., and Ph.D. degrees in Electrical Engineering from Sharif University of Technology, the University of Pittsburgh, and the University of Michigan in 1988, 1991, and 1994, respectively. From 1994 to 1995, he was a Post-Doctoral Fellow at the University of Michigan. Since 1995, he has been with the University of Tehran, where he is currently a Professor in the School of Electrical and Computer Engineering and the Director of the Low-Power High-Performance Nanosystems Laboratory. Also, while on a research leave from the University of Tehran, he was a Research Fellow at the University of Toronto and the University of Waterloo in 1998 and 1999, respectively. His current research interests include low-power high-performance design methodologies from the physical design level to the system level for the nanoelectronics era. Dr. Afzali-Kusha is a senior member of IEEE.

Zainalabedin Navabi, Ph.D., is professor of electrical and computer engineering at University of Tehran. Dr. Navabi has worked in the design, definition, and implementation of hardware description languages and the synthesis and testing of digital systems. He has developed and supervised the development of many HDL-related software packages and tools, and has directed projects in VLSI design, test synthesis, simulation, synthesis, and other aspects of digital system design automation. Dr. Navabi is a member of ACM, IEEE, and IEEE Computer Society, and is an active participant in the IEEE DASC committee that sets standards related to hardware description languages.

View full text

EDXY – A low cost congestion-aware routing algorithm for network-on-chips

Abstract

Introduction

Section snippets

XY and DyXY routing mechanisms and their limitations

EDXY routing solution

Link failure tolerance

Experimental results

Conclusions

Networks on chips: a new SoC paradigm

IEEE Computer

Data Networks

Principles and Practices of Interconnection Networks

The turn model for adaptive routing

Journal of the ACM

The odd–even turn model for adaptive routing

IEEE Transactions on Parallel and Distributed Systems