On an efficient NoC multicasting scheme in support of multiple applications running on irregular sub-networks
Introduction
Advance in technology continues to drive the increase of transistor integration capacity. It is estimated that by 2015, there will be 100 billion transistors integrated on a 300 mm2 die [1]. To exploit this large number transistors and also take into consideration of pressing high power consumption of ever bigger chips, the design paradigm is migrating to many-core architectures [1], [2]. Network-on-chip (NoC) [3] has been proposed as the mainstream on-chip network architecture to efficiently interconnect the large number of (16 or more) processing cores integrated on a many-core system. Some most recent, high profile examples include Intel’s Teraflop [4] and Tilera [5] chips featuring many-core chip multiprocessors (CMPs) architectures with 2D mesh topologies [13] for on-chip interconnect.
With the development of diverse applications and programming models on CMPs, one-to-many communication and one-to-all communication are becoming more common. For example, in CMPs with cache coherent shared memory systems, the cache coherence protocols exhibit one-to-many communication characteristics to keep the ordering of different requests or to invalidate shared data on different cache nodes [6]. In [7], it has been observed that 5–10% of the network traffic is one-to-many in nature, ranging from scientific workloads to commercial workloads, in communication traces of different cache coherence protocols and operand network. Therefore, efficient support of one-to-many communications in CMPs, particularly hardware multicast support, will benefit a wide range of applications by boosting the network performance with reduced power consumption. Unfortunately, up to date, there is only very limited number of chip router designs that actually support multicasting [6], [7], [8].
In addition, the following issues make multicast supporting even more complicated. The first issue is topology irregularity. The large number of cores on a CMP unquestionably offers high parallelism in computation. To better utilize these vastly available computation resources, virtualization of the chip becomes a necessity [9], where resources can be distributed among different virtual machines [3]. Applying virtualization [8] at the NoC level basically allows a single NoC-based CMP to be shared by multiple applications with each mapped to different sub-networks of the chip [10] either statically [11] or dynamically [12]. Fig. 1 shows an example with three applications arriving at 1 ms, 2 ms, and 3 ms. The three applications are allocated to three sub-networks which may not be regular shapes (e.g., 2D mesh, torus). On the other hand, virtualization requires traffic isolation [8]; that is, communication between nodes in a virtualized region is limited to the sub-network only. The irregular sub-network and traffic isolation requirements together negate regular 2D mesh oriented routing algorithms, like XY routing, odd–even routing, etc. [13].
The second issue is unpredictability of the application communication behavior. Different types of applications, such as desktop, server, embedded systems, will be executed on general purpose CMPs. It is impossible to pre-characterize the communication patterns among the cores inside a sub-network. As a result, customized NoC routing approaches (like the ones using routing tables [14]) may not be feasible.
Hence, it is important to design an efficient multicast mechanism which supports irregular topologies without the need of a routing table. In this paper, an irregular sub-network oriented multicast strategy is first proposed. Following this strategy, an irregular sub-network oriented multicast routing algorithm, namely, Alternative Recursive Partitioning Multicast (AL + RPM), is developed based on RPM [13], an efficient deterministic multicast routing algorithm proposed for regular mesh topology. To our best knowledge, our approach is the first multicast routing approach, as opposed to the broadcast-based one [8], that targets to irregular sub-networks.
In the rest of the paper, Section 2 reviews the existing work on multicast routing schemes in NoCs. Section 3 presents the preliminaries. Section 4 describes the irregular sub-network oriented multicast routing strategy and algorithm. Section 5 reports the performance evaluation of AL + RPM. Finally, Section 6 concludes the paper.
Section snippets
Related work
Multicast communication has been extensively studied in computer networks and interconnection networks [13]. However, due to the power and area constraints pertaining to NoCs, supporting multicast in NoCs has a different set of requirements. Particularly, an efficient multicasting approach for NoCs should result in low network latency and low power and area consumptions. A simple multicasting approach is to send a multicast packet as multiple unicast packets. However, such a scheme suffers from
Architecture and power models
The target NoC architecture is a tile based NoC, which is composed of N × N tiles interconnected by a 2-D mesh network. Each tile (node exchangeably), indexed by its coordinate (x, y) or its ID xN + y, where 0 ⩽ x ⩽ N − 1 and 0 ⩽ y ⩽ N − 1, has one processing core and one router. Each router (shown in Fig. 2) connects to its local processing core and four neighbour tiles through bidirectional channels. A 5 × 5 crossbar switch is used as the switching fabric of the router. The arbitration unit arbitrates the
Motivation example and irregular sub-network oriented multicasting strategy
Before the proposed algorithms are described in detail, an example is given to explain the motivation. Fig. 6 shows an irregular sub-network composed of five nodes. A multicast packet is sent from the source node to two destination nodes. The dashed line represents the path if RPM [6] is used. However, since the sub-network is irregular, the dashed path cannot reach the destinations, i.e., at node 4, the packet cannot go West as the link to West is not available in this sub-network.
Experiment settings
To evaluate the performance of the AL + RPM multicast routing algorithm, AL + RPM is simulated under traces from real applications and random traffic. The performance of AL + RPM in terms of power consumption (as defined in Section 3.1) and network latency is compared against bLBDR and multiple unicast. These multicast algorithms are implemented on the cycle accurate simulator Noxim [25]. The power parameters are based on the synthesis results using Synopses Physical compiler with TSMC 90 nm library.
Conclusion
In this paper, an irregular sub-network oriented multicast routing strategy was proposed. The basic idea of this routing strategy is that, if the output channel found by regular topology oriented multicast routing is not available, choose an alternative output channel which also leads to the minimal path to the destination. As a matter of fact, following this strategy, an irregular topology oriented multicast routing algorithm can be designed based on any regular mesh based multicast routing
Acknowledgement
This work has been supported by NSF under grant no. ECCS-0702168 and National Natural Science Foundation of China under grant no. 60873112.
Xiaohang Wang received the B.Eng. degree in communication and electronic engineering from Zhejiang University, China, in 2006. He is currently pursing the Ph.D. degree in communication and electronic engineering at Zhejiang University, China. His research interests include compiler, parallel programming models, core-based digital SoC and NoC design and test.
References (25)
- S. Borkar, Thousand core chips: a technology perspective, in: Proc. 44th Design Automation Conf., ACM, 2007, pp....
- et al.
Challenges and opportunities in many-core computing
Proc IEEE
(2008) - J. Held, J. Bautista, S. Koehl, From a few cores to many: a tera-scale computing research review, Intel Research White...
- et al.
A 5-GHz mesh interconnect for a teraflops processor
IEEE Micro
(2007) - et al.
On-chip interconnection architecture of the tile processor
IEEE Micro
(2007) - L. Wang, Y. Jin, H. Kim, E.J. Kim, Recursive partitioning multicast: a bandwidth-efficient routing for on-chip, in:...
- N.E. Jerger, L.S. Peh, M. Lipasti, Virtual circuit tree multicasting: a case for on-chip hardware multicast support,...
- S. Rodrigo, J. Flich, J. Duato, Efficient unicast and multicast support for CMPs, in: Proc. 41st IEEE/ACM Int’l Symp....
- A. Gavrilovska, S. Kumar, H. Raj, K. Schwan, V. Gupta, R. Nathuji, R. Niranjan, A. Ranadive, P. Saraiya,...
Designing Reliable and Efficient Networks on Chips
(2009)
The Raw microprocessor: a computational fabric for software circuits and general-purpose programs
IEEE Micro
Cited by (18)
A study of a wire-wireless hybrid NoC architecture with an energy-proportional multicast scheme for energy efficiency
2015, Computers and Electrical EngineeringCitation Excerpt :These packets are usually transmitted to their destinations involving multiple hops. This result leads to a longer transmission delay and costs much energy [16–18]. The naturally broadcast property of WiNoC is promising to enhance the multicast transmission performance in CMP as compared with the 3-D topological NoC [19], the optical NoC [20], and the radio-frequency interconnects (RF-I) NoC [21] since the transmission only involves one time transmission (i.e., one-hop transmission).
Efficient multicast schemes for 3-D Networks-on-Chip
2013, Journal of Systems ArchitectureCitation Excerpt :However, the broadcast nature of this scheme makes the network tend to be easily congested, which apparently results in higher power consumption. In our previous work [17,18], an irregular sub-network oriented multicast strategy was proposed for 2-D NoCs. A regular mesh oriented multicast routing algorithm is used as the basic routing algorithm.
A survey of multicast communication in optical network-on-chip (Onoc)
2020, Communications in Computer and Information ScienceA Subnetting Mechanism with Low Cost Deadlock-Free Design for Irregular Topologies in NoC-based Manycore Processors
2016, Proceedings - 2016 3rd International Conference on Information Science and Control Engineering, ICISCE 2016User cooperation network coding approach for NoC performance improvement
2015, Proceedings - 2015 9th IEEE/ACM International Symposium on Networks-on-Chip, NOCS 2015A deadlock-free subnetting mechanism for high performance broadcasting in NoC
2015, IEICE Electronics Express
Xiaohang Wang received the B.Eng. degree in communication and electronic engineering from Zhejiang University, China, in 2006. He is currently pursing the Ph.D. degree in communication and electronic engineering at Zhejiang University, China. His research interests include compiler, parallel programming models, core-based digital SoC and NoC design and test.
Dr. Mei Yang received her Ph.D. in Computer Science from the University of Texas at Dallas in Aug. 2003. She has been an assistant professor in the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas since Aug. 2004. Her research interests include computer architectures, networking, and embedded systems.
Dr. Yingtao Jiang received his Ph.D. in Computer Science from the University of Texas at Dallas in Aug. 2001. He joined the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas in Aug. 2001. He has been an associate professor since Aug. 2007. His research interests include algorithms, computer architectures, VLSI, networking, nano technologies, etc.
Peng Liu received the B. Eng. and M. Eng. degrees in optical engineering from Zhejiang University, in 1992, and 1996, respectively, and the Ph.D. degree in communication and electronic engineering from Zhejiang University, China, in 1999. He has been an Associate Professor with the Information Science and Electronic Engineering Department, Zhejiang University, Hangzhou, China, since 2002. His research focuses embedded processor, multiprocessor systems-on-chip architectures, on-chip interconnection networks, real-time operating system, compiler, and circuits for communications.