Communication modeling of multicast in all-port wormhole-routed NoCs

https://doi.org/10.1016/j.jss.2010.01.016Get rights and content

Abstract

Multicast is one of the most frequently used collective communication operations in multi-core SoC platforms. Bus as the traditional interconnect architecture for SoC development has been highly efficient in delivering multicast messages. Since the bus is non-scalable, it can not address the bandwidth requirements of the large SoCs. The networks on-chip (NoCs) emerged as a scalable alternative to address the increasing communication demands of such systems. However, due to its hop-to-hop communication, the NoCs may not be able to deliver multicast operations as efficiently as buses do. Adopting multi-port routers has been an approach to improve the performance of the multicast operations in interconnection networks. This paper presents a novel analytical model to compute communication latency of the multicast operation in wormhole-routed interconnection networks employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++.

Introduction

Traditionally, the dominant interconnection architectures for systems-on-chip (SoC) has been bus-based. Driven by an exponential downscaling of feature size (following Moore’s law), SoCs consisting of billions of gates and hundreds of processing units are becoming a reality. Since the bus is inherently non-scalable and ad-hoc interconnection of the IP cores is not affordable in large SoCs, adopting other alternatives to serve as communication medium for SoC development has become inevitable. Networks on-Chip (NoCs) have been emerged as the most promising solution (Dally et al., 2001, Benini and De Micheli, 2002). NoCs offer a scalable (Guerriert, 2001, Rijpkema et al., 2003), structured (Benini and De Micheli, 2002), power efficient and reliable (Benini and De Micheli, 2001, Bolotin et al., 2004) communication medium to address the technological and design issues regarding to development of large SoCs.

The bus-based SoC development has been able to leverage the advantages of a shared medium to efficiently implement most operations of a collective nature. An example of such operations is cache coherency, which is based on multicasting a message to all or a number of processing cores.

Unlike the bus-based architectures, the communication in an NoC-based platform is hop-by-hop in nature and efficient implementation of collective communication operations is still a challenge. NoCs are in principle similar to interconnection networks for parallel computers with multiple processors. Therefore, the NoC community can leverage the results of a large body of research available in that domain to address efficient implementation of the collective communication in the NoC realm.

Analytical modeling has been widely used to evaluate performance of the communication in interconnection networks. The literature has witnessed numerous analytical performance models of unicast traffic (Draper and Ghosh, 1994, Moadeli et al., 2007) and analysis of unicast traffic in presence of broadcast traffic (Shahrabi et al., 2003) in parallel computers and NoC domains. (Shahrabi et al., 2000) introduced a model for computing broadcast communication latency in a Hypercube-topology network. However, in their system model only unicast traffic was wormhole-routed, and broadcast communication was not wormhole-routed. Also, their model was developed for architectures using one-port routers. Moadeli et al. (2009) presented a model to compute average message latency of the multicast communication in NoCs using all-port routers. This work constitutes an improvement to the earlier work by presenting a more detailed representation of the analytical model, adding the analysis of broadcast along with the multicast communication, and also presenting a more detailed description of the traffic in the Quarc NoC.

The paper is organized as follows. The next section presents the related work to offer multicast. Section 3 introduces a method for analyzing the average message latency of multicast communication in all-port wormhole-routed interconnection networks. Section 4 presents a brief description of the Quarc NoC and the broadcast/multicast routing algorithm in the architecture. Section 5 compares our analytical evaluation with simulation results and finally, in Section 6 the conclusion and future works are presented.

Section snippets

Related work

Collective communications operations have been traditionally adopted to simplify the programming of applications for parallel computers, facilitate the implementation of efficient communication schemes on various machines, and promote the portability of applications across different architectures (Duato et al., 2003). These communication operations are particularly useful in applications which often require global data movement and global control in order to exchange data and synchronize the

The analysis method

This section introduces a model to evaluate the average message latency of multicast communication in wormhole-routed interconnection networks generating both unicast and multicast/broadcast traffic. We assume that the network employs multi-port routers.

In direct interconnection networks, a router is connected to other neighboring routers through a number of external links. The router is also connected to the local node via one or more internal links. The architectures adopting only one

The quarc architecture

The Quarc NoC architecture (Moadeli et al., 2008), is introduced as a simple and efficient NoC offering high performance multicast/broadcast communication services. The Quarc NoC improves on the Spidergon NoC (Coppola et al., 2004) while preserving all its essential features including the wormhole switching and deterministic routing algorithm, as well as the efficient on-chip layout.

In the Quarc NoC, nodes are connected by unidirectional links. Let the number of nodes be an even N=2n (where n

Validation

To validate the analytical model we have developed a discrete-event simulator of the Quarc NoC operating at flit level using OMNET++ (Varga, 2002). The schematic of the components in each node is shown in Fig. 6.

The source produces the messages according to a Poisson distribution. The passive queue has two queues to store the messages belonging to multicast and unicast traffic. The passive queue sends the messages based on their creation time. The passive queue is connected to the router

Conclusion

Collective communication operations form an important part of overall traffic in SoCs. Multicast is one of the most frequent collective communication operations. This paper has introduced a novel analytical model to predict the average message latency of wormhole-routed multicast communication in direct interconnection networks adopting asynchronous multi-port routers. The analytical multicast model has been applied to the Quarc NoC, a highly efficient NoC for performing collective

Mahmoud Moadeli received the BSc degree in computer hardware engineering from Shiraz University, Shiraz, Iran in 1998 and the MSc degree in computer systems architecture from Tarbiat Modares University, Tehran, in 2001. He is currently pursuing a PhD in computer science at University of Glasgow, Glasgow, UK. His research interests include Systems On-Chip and Networks-On-Chip with particular interest on implementation of the efficient collective communication operations.

References (29)

  • J. Duato et al.

    Interconnection Networks: An Engineering Approach

    (2003)
  • K. Goossens et al.

    Aethereal network on chip: concepts, architectures, and implementations

    IEEE Design and Test of Computers

    (2005)
  • Guerriert, A.G.P., 2001. A generic architecture for on-chip packet-switched interconnections. In: Proceedings of Design...
  • L. Kleinrock

    Queueing Systems Volume I: Theory

    (1975)
  • Mahmoud Moadeli received the BSc degree in computer hardware engineering from Shiraz University, Shiraz, Iran in 1998 and the MSc degree in computer systems architecture from Tarbiat Modares University, Tehran, in 2001. He is currently pursuing a PhD in computer science at University of Glasgow, Glasgow, UK. His research interests include Systems On-Chip and Networks-On-Chip with particular interest on implementation of the efficient collective communication operations.

    Dr. Wim A. Vanderbauwhede received a PhD in Electrotechnical Engineering from the University of Gent, Belgium in 1996.He is currently Lecturer in Embedded Systems Software at the Department of Computing Science of the University of Glasgow, which he joined in April 2004. His research focuses on novel architectures for Network-on-Chip based heterogeneous multi-core Systems-on-Chip and coarse-grained dynamically reconfigurable systems. Previously, he was a Research Assistant at Strathclyde University, working on high-speed optical packet switching. His research has resulted in over 40 refereed conference and journal papers. Before returning to academic research, Dr Vanderbauwhede worked as a Mixed-mode Design Engineer and Senior Technology R&D Engineer for Alcatel Microelectronics.

    View full text