A novel Wireless Network-on-Chip architecture with distributed directories for faster execution and minimal energy

https://doi.org/10.1016/j.compeleceng.2017.12.038Get rights and content

Highlights

Abstract

Wireless Network-on-Chip (WNoC) architectures are introduced to improve performance by reducing the core-to-core communication latency. Conventional WNoCs broadcast messages that increase bandwidth-traffic, communication delay, and power consumption. Studies show that directory-based architectures have potential to address message broadcasting and improve performance. This work proposes a novel WNoC architecture with distributed directories (WNoC-DDs) that supports wireless communications to enhance faster execution by reducing latency. VisualSim software package is used to model and simulate the proposed WNoC-DDs, a WNoC with centralized directory (WNoC-CD), and a traditional 2D mesh by processing different communication scenarios. The proposed architecture helps reduce the total hop count and unwanted broadcasting among nodes in a WNoC-DDs. Experimental results show that the proposed WNoC-DDs reduces communication delay up to 20.54% and 5.40%, respectively, when compared to mesh and WNoC-CD. Similarly, the proposed WNoC-DDs reduces power consumption up to 73.56% and 19.97%, respectively, when compared to mesh and WNoC-CD.

Introduction

The dominant technology such as Network-on-Chip (NoC) is becoming trendy and can solve performance limitations of traditional wired interconnects and productive for System-on-Chip (SoC) architectures. Recent studies indicate that lot of products, such as, processors, cell phones, memory subsystems and many other embedded products are integrated on a single chip and interconnected by NoC [1], [2], [3]. The design of multicore systems makes easy to solve complex jobs by working concurrently in parallel with improved execution speed and reduced power consumption [4], [5]. Multithreading is a process in which a central processing unit (CPU) can execute several number of threads simultaneously. Memory-balanced scheduling is a thread scheduling approach that improves the performance by balancing memory access requirements but at the cost of interconnects width and bandwidth [6]. However, the programming for large scale multicore architecture is always challenging [7]. The functionality of multicore can be outstanding when the cache coherence is reduced, and it could be less in private memory multicore architectures but are expensive. The shared memory plays a trade-off approach of cost and cache coherence problem. Snooping protocols address coherence issues but are limited to small core counts, whereas directory protocols are good for large cores. Recent studies indicate that the snoopy protocols can be extended to 36-core but the power and area are the major drawbacks [8]. The performance and efficiency of communication among cores can be achieved with the implementation of additional parallelism and multithreading.

Multicore designs should assure less chip area, reduced communication latency and reduced power consumption thereby, providing better communication among cores on the chip. NoC architecture, is a technology proposed to overcome the problem of large communication delay among cores in a multicore architecture. Multiple interconnects are proposed to address latency and power issues such as concentrated-sparse mesh [9], millimeter wave wireless interconnects [10], crossbar on-chip interconnects [11]. The primary purpose of those interconnects is to overcome power and latency issues. But the designs are still tough to address performance issues within low cost and scalability. However, with the constraints and limitations of multicore designs, the development of efficient NoC grabbed an outstanding attraction.

The routing paths for efficient communication among cores are different and follow adaptive or non-adaptive algorithms in multicore architectures. In traditional mesh multicasting, the popular non-adaptive technique is XY routing algorithm [12]. Wireless Network-on-Chip with centralized directory (WNoC-CD) architecture [13] uses an adaptive XY routing algorithm to decide the path between nodes [14]. Also, WNoC-CD uses the buffer management to improve the performance by taking care of queuing delays without affecting the throughput rate [15]. The topology of a multicore architecture on chip plays a predominant role in the communication delay. The introduction of directory in multicore architectures, can improve the performance like faster execution and overcome the cache coherence problems [16], [17], [18]. However, the centralized directory lacks its performance and slows connections for several reasons such as insufficient bandwidth, increase in network size, and heavy traffic [19].

The existing interconnection technologies such as RF-I and UWB have speed, bandwidth, area, RC (Resistor Capacitor) wired interconnect, and power issues. The proposed distributed directories are Stanford Directory Architecture for Shared (DASH) memory that addresses several issues such as speed problem, bandwidth, traffic, area, power consumption and data sync. In detail, speed problem can be reduced by using XY routing algorithm, bandwidth as well as traffic issues reduced by distributed directories mechanism, area can be narrowed by reducing RC interconnects, and the power is reduced on factors such as selection of shortest path to reach destination. Data sync is better with distributed directories compared to synchronization from individual cores level. In this paper, we propose distributed directories based architecture with wireless routers to overcome centralized directory limitations such as reducing communication delay, hop count, and power consumption.

Section 2 summarizes related published articles. In Section 3, the proposed distributed directories based multicore architecture with wireless routers is introduced. In Section 4, the experimental details are described. In Section 5, experimental results and related discussions are presented. In Section 6, the conclusions of the work are presented.

Section snippets

Background study

In this section, we discuss some popular network topologies used in WNoC to reduce the performance bottleneck that may reduce the data throughput, and scalability. We also consider the issues of cache coherence that lead to complication of data exchange between cores, and how a Stanford DASH architecture will address the coherence using customized MESI protocol.

Proposed distributed directories for WNoC

To improvise the performance of WNoC architecture, distributed directories are introduced into this work that can manage data sync of all subnets, also maintaining minimal routing path, which in further allows faster execution with minimal energy. The proposed architecture is a hybrid combination of the traditional WNoC with distributed directories and DASH architecture. The major goal of the proposed multicore architecture is to reduce the communication delay among the cores by decreasing the

Experimental details

The proposed WNoC architecture with distributed directories is evaluated and compared with traditional mesh and WNoC-CD architectures. To model and simulate our proposed system, we use VisualSim Architect 15 tool [29]. The tool is popular for designing computation systems, which provides the flexibility of modeling and simulating customized designs. The tool is efficient to analyze performance trade-offs between various architectures using bandwidth utilization, communication delay, routing

Results and discussion

Experimental results of all the three architectures are discussed in this section. The performance comparison of communication delay, hop count and power consumption for each individual task is illustrated in Table 4, Table 5, Table 6. Table 4 represents the communication delay for all architectures. When the source to destination is directly connected or with no intermediate cores, and in-subnet Tasks such as 21 to 25 the communication delay is identical in all architectures as they basically

Conclusions

Performance of modern Network-on-Chip architectures depends on communication latency, hop count, and power consumption. If the communication setup-time is quick, then the system performance should be better. In this work, we introduce a WNoC architecture with distributed directories (WNoC-DDs) to improve the performance to power ratio. A directory allows the tasks to execute faster by providing adaptive minimal routing path to reach the destination node. VisualSim Architect is used to model and

Kishore K. Chidella received M.Tech degree in Embedded Systems from JNTU, India and B.E in ECE from Anna University, India. He is currently working towards the Ph.D. degree in Embedded Systems at Wichita State University, USA. His research interests include embedded systems, sensors monitoring, high-performance computing, and low-power computer architecture. He has published several articles out of his research work.

References (33)

  • J. Liu et al.

    The role of interconnects in the performance scalability of multicore architectures

  • A. Ros et al.

    A scalable organization for distributed directories

    J Syst Archit

    (2010 Mar 31)
  • J. Hu et al.

    Energy-aware mapping for tile-based NoC architectures under performance constraints

  • M.B. Taylor et al.

    Scalar operand networks: on-chip interconnect for ILP in partitioned architectures

  • J. Held et al.

    From a few cores to many: a tera-scale computing research overview, White paper

    (2006)
  • K. Zhu et al.

    Research on low power scheduling of heterogeneous multi core mission based on genetic algorithm

  • G.B. Bezerra

    Energy consumption in networks on chipefficiency and scaling

    (2012)
  • O. Khan et al.

    Dcc: a dependable cache coherence multicore architecture

    IEEE Comput Archit Lett

    (2011)
  • B.K. Daya et al.

    SCORPIO: a 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering

    ACM SIGARCH Comput Archit News

    (2014 Oct 16)
  • T.C. Xu et al.

    Exploration of a heterogeneous concentrated-sparse on-chip interconnect for energy efficient multicore architecture

  • S. Deb et al.

    Design of an energy efficient CMOS compatible NoC architecture with millimeter-wave wireless interconnects

    IEEE Trans. Comput

    (2012 Sep 19)
  • D. Park et al.

    MoDe-X: microarchitecture of a layout-aware modular decoupled crossbar for on-chip interconnects

    IEEE Trans Comput

    (2014)
  • S.D. Chawade et al.

    Review of XY routing algorithm for Network-on-Chip architecture

    Int J Comput Appl

    (2012)
  • A. Asaduzzaman et al.

    An energy-efficient directory based multicore architecture with wireless routers to minimize the communication latency

    IEEE Trans Parallel Distrib Syst

    (2017 Feb 1)
  • D. Zoni et al.

    Cutbuf: buffer management and router design for traffic mixing in vnet-based nocs

    IEEE Trans. Parallel Distrib. Syst.

    (2016 Jun 1)
  • M. Karsten et al.

    Traffic-driven implicit buffer management-delay differentiation without traffic contracts

  • Cited by (6)

    • A low-power WNoC transceiver with a novel energy consumption management scheme for dependable IoT systems

      2023, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      The authors of [19] designed an optimized congestion-aware application-specific architecture based on mesh-of-tree topology to reduce the congestion probability. In [16], WNoC-DDs, as a fast and low-power WNoC architecture with distributed directories, was presented. WNoC-DDs architecture reduces communication delay and alleviates power consumption compared with a WNoC with centralized directory.

    • A systematic analysis of power saving techniques for wireless network-on-chip architectures

      2022, Journal of Systems Architecture
      Citation Excerpt :

      Although comparing with traditional WNoC in terms of power consumption, the WNoC architecture proposed in [144] saves more energy, it has some disadvantages including delay and hop count. With the aim of lessening communication delay, hop count and power consumption, the authors of [145] presented a WNoC architecture with distributed directories (WNoC-DDs). Fig. 7 illustrates WNoC-DDs architecture.

    • A survey and taxonomy of congestion control mechanisms in wireless network on chip

      2020, Journal of Systems Architecture
      Citation Excerpt :

      However, comparison of the proposed MAC protocol with various MACs is not evaluated in the network congestion state. In addition to the various survey research, several significant MAC schemes have been proposed to improve the network performance of WiNoC [10–17]. Although previous works that are mentioned above, show essential issues of WiNoCs, a comprehensive review of congestion control mechanisms in WiNoCs has not been performed.

    • Impact of non-uniform subnets on the performance of wireless network-on-chip architectures

      2019, 2019 IEEE 9th Annual Computing and Communication Workshop and Conference, CCWC 2019

    Kishore K. Chidella received M.Tech degree in Embedded Systems from JNTU, India and B.E in ECE from Anna University, India. He is currently working towards the Ph.D. degree in Embedded Systems at Wichita State University, USA. His research interests include embedded systems, sensors monitoring, high-performance computing, and low-power computer architecture. He has published several articles out of his research work.

    Abu Asaduzzaman is currently Associate Professor of Computer Engineering at Wichita State University. He received research grants from NSF KS EPSCoR, NVIDIA, and NetApp. His research interests include computer architecture, high performance computing, and embedded systems. He has authored more than 80 peer-reviewed journal and conference articles out of his research work. He is a member of IEEE and ASEE.

    Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. A. Isazaheh.

    View full text