Elsevier

Advances in Computers

Volume 124, 2022, Pages 151-215
Advances in Computers

Chapter Six - Approximate communication for energy-efficient network-on-chip

https://doi.org/10.1016/bs.adcom.2021.09.004Get rights and content

Abstract

Approximate Computing is being touted as a viable solution for high-performance computation by relaxing the accuracy constraints of applications. This trend has been accentuated by emerging data intensive applications in domains like image/video processing, machine learning, and big data analytics that allow inaccurate outputs within an acceptable variance. With the increasing communication demand as well as the optimization bottleneck of NoC performance and energy consumption, approximate communication, which leverage relaxed accuracy for energy-efficiency Networks-on-Chip (NoC), have become the accepted method for connecting a large number of on-chip components. We, respectively, proposed approximate designs for traffic regulation, bufferless NoC, and multiplane NoC. These designs improve network performance and reduce power consumption by reducing network load, optimizing data transmission, and optimizing network architecture design. The approximate communication designs show a huge improvement in energy-efficient NoCs while maintaining low application error.

Introduction

Approximation by trading off output accuracy for benefits in performance and energy efficiency has gained a high degree of recognition as a solution for satisfying energy-efficiency hardware design [1]. Approximate designs rely on the ability of applications to tolerate computation on noisy/erroneous data or imprecision in the computation results. There are large number applications of machine learning, searching, scientific computing, and multimedia that are inherently tolerant approximation [2]. Since inexactness is acceptable, these applications allow a presence of approximate data in storing, computing, or transmitting. These applications, which exhibit some level of error tolerance, motivate the approximate hardware designs to achieve high performance and energy efficiency.

Now approximate computing, as an emerging performance-efficient paradigm, has been widely used in computer architecture design, such as approximate memory system [3, 4], value approximation in CPU-based [5, 6] and GPU-based system [7, 8], relaxes synchronization [9], resilience-aware circuit clocking scheme [10], and so on. Compute-based approximation techniques use inexact compute units [[11], [12], [13], [14], [15], [16], [17], [18], [19]] or neural network models [20, 21] for code acceleration. Memory-based techniques exploit data similarity across memory hierarchies to achieve larger capacity, energy efficiency, or lifetime optimization . A significant portion of research on hardware approximation techniques has focused on either the computation units for accelerated inaccurate execution, or the storage hierarchy (cache/DRAM-based) for low-overhead (area/power) memory.

Approximate communication techniques also deserve attention. With increasing on-chip core counts, network-on-chip (NoC) has emerged as the most competent method for on-chip communication in large-scale parallel systems. It connects varied on-chip components, such as cores, caches, and memory controllers. And it allows the communication necessary for exchanging data of parallel threads and ensuring data coherence. However, NoCs consume a significant amount of power in modern chip multiprocessors (CMP) [22]. Energy efficiency has been a primary concern in NoC designs [23]. Reducing the NoC power while increasing performance is essential for scaling up to larger chip multiprocessor systems. Relaxing accuracy in exchange for performance improvement and energy saving, approximate techniques show their bright future on the research of energy-efficiency designs.

This chapter, with a focus on the approximate communication design for energy-efficient NoC, mainly conducts exploratory research in the following three aspects:

First, a dynamic traffic regulation scheme is proposed for approximate communication of NoC. Network congestion is one of the main factors that affect transmission delay, and different traffic flows have different impacts on network congestion. This method designs an approximation-based traffic regulation structure in the network interface, which reduces the amount of injected data through data approximation, and can regulate the injection rate of each node. In addition, it designs a dynamic traffic regulation algorithm to dynamically adjust the injection rate of each node according to the impact of traffic flow on network congestion. Thus, it improves the NoC performance. Based on the PARSEC benchmark experiments, the results show that this method can reduce the average transmission delay by 30.9% on average, reduce application execution time by 15.8%, and achieve dynamic power saving by 24.4% within 10% quality loss.

Second, a performance optimization method for bufferless NoC based on approximate communication is proposed. By removing the buffers, the bufferless NoC reduces power consumption and area overhead but also leads to an increase in transmission delay and a decrease in network throughput. Through the performance analysis of the bufferles NoC, in the retransmission-based bufferless NoC, packet retransmission is a key factor affecting the NoC performance. In order to improve the performance of the bufferless NoC, this method designs a new bufferless NoC architecture, which reduces packet retransmission through lossy transmission and improves the NoC performance. Moreover, it also proposes a packet approximate codec design to approximate the missing data. Thus, rhis method improves the performance of bufferless NoC with extremely low quality loss. Based on the PARSEC benchmark experiments, the results show that compared with the existing bufferless NoC, this design reduces the retransmission by 83.6%, reduces the transmission delay by 46.7%, increases the network throughput by 92%, and achieves application acceleration 1.2 times, while maintaining low application error.

Third, an NoC energy optimization method based on multiplane network design is proposed. The NoC performance optimization usually leads to an increase in area overhead and affects the energy consumption of NoC. In order to reduce the energy consumption of NoC, this method designs a two-plane network structure which includes a lossy subnetwork and a lossless subnetwork. Based on lossy transmission, the lossy subnetwork realizes a lightweight, low-delay, bufferless architecture design. In addition, based on the multiplane transmission design, this method speeds up part of the data transfer and achieves transmission quality control. Thus, this method improves NoC performance while reducing NoC area overhead and power consumption. Based on the PARSEC benchmark experiments, the results show that compared with the single-plane NoC, this method reduces the transmission delay by 41.9%, and saves 48.6% of the NoC area overhead and 25.7% of the NoC power consumption under the same throughput.

The rest of the paper is organized as follows. Section 2 details the related work. In Section 3, we present the approximation-based dynamic traffic regulation design. Section 4 explains the approximate bufferless NoC implementation. Section 5 presents the design of approximate multiplane NoC.

Section snippets

Related work

Recent studies have been conducted regarding approximate computing in NoC architecture design for applications that allow inaccurate outputs [[24], [25], [26], [27], [28], [29]]. These articles explore the performance improvement or energy efficiency of approximate computing for reducing communication bottlenecks by two techniques: communication reduction and dynamic power management. The APPROX-NoC [25], DAPPER [24], and DEC-NoC [26] belong to the former. APPROX-NoC reduces injected flits by

Approximation-based dynamic traffic regulation

Different traffic flows have different impacts on network congestion. For example, Fig. 1 shows the network transmission status of a certain time. Packets from nodes 1, 2, 6 contribute to the network congestion, while packets from node 5 don’t. Therefore, controlling the packets injected from nodes 1, 2, 6 can lead to better congestion improvement for the network. However, the transmission state will be very complicated in NoC. Each router is likely to communicate with others. Its complexity

Approximate bufferless network-on-chip

The NoC serving as an effective interconnection fabric connects many on-chip components. It provides better scalability and higher bandwidth compared to traditional interconnections such as the bus and crossbar [[48], [49], [50]]. However, NoCs consume a significant amount of power in CMPs, that is, 40% of the tile power consumption in the 16-tile MIT RAW chip [51], 28% in the 80-tile Intel TeraFLOPS chip [22], and 19% in the 36-tile SCORPIO chip [52]. Buffers consume a large portion of network

Approximate multiplane network-on-chip

Reducing the power of the NoC while increasing performance is essential for scaling up to larger systems for future CMP designs. Minimizing power consumption requires more efficient use of network resources. Multiplane NoCs have shown their efficiency in total bandwidth usage [23, 61]. Furthermore, multiplane NoCs can be designed with heterogeneous physical subnetworks; as a result, messages are injected into different subnetworks to satisfy different transmission properties. For many

Ling Wang received the B.S. degree in monitoring and control technology from the Harbin University of Science and Technology, China, in 2010, and the M.S. degree in biomedical engineering from the Harbin Institute of Technology, China, in 2012, where he also get his Ph.D. degree in computer applied technology in 2021. He is currently a lecturer at Xidian University. His research interests include high-performance many-core architecture, network on chip and AI accelerator.

References (62)

  • C.-C. Hsiao et al.

    Energy-aware hybrid precision selection framework for mobile GPUs

    Comput. Graph.

    (2013)
  • S. Mittal

    A survey of techniques for approximate computing

    ACM Comput. Surv.

    (2016)
  • A. Raha et al.

    Quality-aware data allocation in approximate DRAM

  • A. Sampson et al.

    Approximate storage in solid-state memories

    ACM Trans. Comput. Syst.

    (2014)
  • Y. Luo et al.

    Characterizing application memory error vulnerability to optimize datacenter cost via heterogeneous-reliability memory

  • J.S. Miguel et al.

    Load value approximation

  • R.S.t. Amant et al.

    General-purpose code acceleration with limited-precision analog computation

    ACM SIGARCH Comput. Archit. News

    (2014)
  • A. Yazdanbakhsh et al.

    RFVP: rollback-free value prediction with safe-to-approximate loads

    ACM Trans. Archit. Code Opt.

    (2016)
  • J. Sartori et al.

    Branch and data herding: reducing control and memory divergence for error-tolerant GPU applications

    IEEE Trans. Multimedia

    (2013)
  • L. Renganarayana et al.

    Programming with relaxed synchronization

  • Y. Wang et al.

    Resilience-aware frequency tuning for neural-network-based approximate computing chips

    IEEE Transactions on Very Large Scale Integration (VLSI) Systems

    (2017)
  • H. Esmaeilzadeh et al.

    Architecture support for disciplined approximate programming

  • S. Venkataramani et al.

    Quality programmable vector processors for approximate computing

  • A. Chandrasekharan et al.

    ProACt: a processor for high performance on-demand approximate computing

  • G. Ndour et al.

    Evaluation of variable bit-width units in a RISC-V processor for approximate computing

  • D. Peroni et al.

    ARGA: approximate reuse for GPGPU acceleration

  • M. Imani et al.

    Rmac: runtime configurable floating point multiplier for approximate computing

  • R.R. Osorio et al.

    Truncated SIMD multiplier architecture for approximate computing in low-power programmable processors

    IEEE Access

    (2019)
  • C.K. Jha et al.

    Seda-single exact dual approximate adders for approximate processors

  • T. Moreau et al.

    SNNAP: approximate computing on programmable socs via neural acceleration

  • H. Esmaeilzadeh et al.

    Neural acceleration for general-purpose approximate programs

    Commun. ACM

    (2014)
  • Y. Hoskote et al.

    A 5-GHz mesh interconnect for a teraflops processor

    IEEE Micro

    (2007)
  • Z. Li et al.

    The runahead network-on-chip

  • V.Y. Raparti et al.

    DAPPER: data aware approximate NoC for GPGPU architectures

  • R. Boyapati et al.

    APPROX-NoC: a data approximation framework for network-on-chip architectures

  • Y. Chen et al.

    DEC-NoC: an approximate framework based on dynamic error control with applications to energy-efficient NoCs

  • A.B. Ahmed et al.

    AxNoC: low-power approximate network-on-chips using critical-path isolation

  • G. Ascia et al.

    Improving energy consumption of NoC based architectures through approximate communication

  • G. Ascia et al.

    Approximate wireless networks-on-chip

  • F. Betzel et al.

    Approximate communication: techniques for reducing communication bottlenecks in large-scale parallel systems

    ACM Comput. Surv.

    (2018)
  • M.F. Reza et al.

    Approximate communication strategies for energy-efficient and high performance NoC: opportunities and challenges

  • Cited by (1)

    Ling Wang received the B.S. degree in monitoring and control technology from the Harbin University of Science and Technology, China, in 2010, and the M.S. degree in biomedical engineering from the Harbin Institute of Technology, China, in 2012, where he also get his Ph.D. degree in computer applied technology in 2021. He is currently a lecturer at Xidian University. His research interests include high-performance many-core architecture, network on chip and AI accelerator.

    Xiaohang Wang received the B.Eng. and Ph.D. degrees in communication and electronic engineering from Zhejiang University, in 2006 and 2011, respectively. He is currently an Associate Professor with the South China University of Technology. His research interests include many-core architecture, power efficient architectures, optimal control, and NoC-based systems.

    View full text