Research noteAdaptive error correction in Orthogonal Latin Square Codes for low-power, resilient on-chip interconnection network
Introduction
As a new SoC design paradigm, Network-on-Chip (NoC) has been proposed to support challenges of increasing interconnect complexity. The basic idea is the use of packet switching methodology that has been extensively used for the computer. Network-like communication improves scalability and freedom from the limitation of complex wiring. Therefore, NoC is viewed as an enabling solution to deliver fast, reliable, energy-efficient communication among on-chip computing resources.
Effect of variability in integrated circuits causes significant deviations from the prescribed specification. For instance, parameter variations have an unpredictable impact on the speed/power of devices that can ultimately lead up to 30% in delay variation. Temporal variations such as environmental and aging variations can also threaten the circuit functionality by leading to delay increase over time [1]. Such a delay variations can result in failure on on-chip interconnection network. In recent years, there has been evolving effort in developing reliable, energy-efficient on-chip networks to integrate increasingly large number of cores in a single chip. The incorporation of different coding schemes in interconnection design is being investigated to protect against any transient malfunction as well as to reduce energy consumption of an on-chip network thanks to the increased reliability. Sridhara and Shanbhag proposed a unified framework of coding for SoC with crosstalk avoidance codes and error control codes [2]. Ganguly et al. proposed the triple error correction/quadruple error detection code [3], which combines Hamming code followed by duplicate-add-parity (DAP). Huang et al. proposed self-corrected green coding scheme for an on-chip interconnection, which is comprised of two stages; green bus coding stage and triplication EC stage [4]. Lee et al. proposed a resilient on-chip interconnection which corrects up to quadruple errors based on Orthogonal Latin Squares [5].
Clearly, these researches highlight that the enhancing reliability of circuits is gaining significant momentum and such resilient and low-power design techniques need to be addressed simultaneously at different levels of design abstraction. In the previous work, we introduced the multi-bit error correction code for on-chip interconnection and investigated the word-error-rate and hardware cost of the proposed error-correcting code (ECC) compared to other ECC schemes [5]. In this paper, we propose an adaptive ECC which provides opportunity and flexibility to adaptively change the error correction capability according to system’s reliability level based on our previous work, reducing power consumption of the network while providing the required reliability. Experimental results demonstrates the feasibility of the proposed methods for low-power, resilient NoC designs.
Section snippets
Communication in Network-on-Chip (NoC)
Communication in NoC takes place in the form of packets. A packet is further divided into fixed-length flow control units (flits) and a switch forwards flits to the destination. In this paper, we use switch-to-switch error correction scheme such that an encoder and a decoder are placed between adjacent switches. Core injects a k-bit width flit into network through network interface (NI). After ECC encoding unit, the coded flit with additional check bits has a larger width of n, thus enlarging
Orthogonal Latin Square Codes
The class of Orthogonal Latin Square codes (OLSCs) was developed by adding redundancy systematically. The one-step majority voting ensures the decoding to be fast and inexpensive [6]. Below we provide a brief explanation of t-bit ECC design with OLSC, although we refer the reader to [5] for a more detailed description including implementation and hardware costs.
Benefit of the proposal
In order to evaluate the power characteristics of the proposed adaptive ECC, each component was modeled in Verilog HDL. In this experiment, the source router sends packets to the sink router as shown in Fig. 2. The power consumption of the codec was extracted using 90 nm technology. The RTL description was synthesized to the gate level net-list by using Synopsys Design Compiler. As part of this step, physical information such as RC parasitic values file, standard delay format, and design
Conclusions
In this paper, we proposed an adaptive ECC which provides opportunity and flexibility to adaptively change the error correction capability according to system’s reliability level. Our experimental results demonstrate that the adaptive ECC provides a trade-off between energy consumption and reliability by adapting error correction capability in run-time. Energy reduction on interconnection network incorporating with our proposal will be increasingly favorable in multiprocessor SoCs. We plan to
Acknowledgment
This study was financially supported by the Seoul National University of Science and Technology.
References (8)
- et al.
Containing the Nanometer “Pandora-Box”,: Cross-Layer Design Techniques for Variation Aware Low Power Systems
IEEE Journal on Emerging and Selected Topics in Circuit and Systems
(2011) - et al.
Coding for system-on-chip networks: a unified framework
IEEE Trans VLSI
(2005) - et al.
Crosstalk-aware channel coding schemes for energy efficient and reliable NOC interconnects
IEEE Trans VLSI
(2009) - Huang P, Fang W, Wang Y, Hwang W. Low power and reliable interconnection with self-corrected green coding scheme for...
Cited by (6)
A 28 nm full-margin, high-reliability, and ultra-low-power consumption sense amplifier for STT-MRAM
2019, Microelectronics ReliabilityCitation Excerpt :Compared with other previous findings, the current study provides more considerable results. One study [10] proposed a low-cost built-in error correction circuit under the 40 nm technology node to improve the reliability of STT–MRAM, which is derived from the orthogonal Latin square code [11,12]. A pre-charged sense amplifier was used to sense data bits, as well as an “XOR” encoder and a one-step majority-voting decoder, to achieve a considerably small area and high speed.
Low-power fault-tolerant interconnect method based on LCDMA and duplication
2015, Microelectronics ReliabilityCitation Excerpt :A joint crosstalk avoidance coding and multiple error correction code which combines Hamming code followed by duplication and parity, at cost of duplicating the number of wires (for 32 bit data transfer 78 wires are required), is described in [4]. An adaptive error correction solution which uses orthogonal Latin square codes which provides a trade-off between energy consumption and reliability in run-time is described in [13]. In [14] the usage of communication grid instead of simple wires as main interconnect mechanism based on implementation of logic code division multiple access (LCDMA) as IR strategy intended for N-tuple modular redundancy (NMR) is described.
A low-cost built-in error correction circuit design for STT-MRAM reliability improvement
2013, Microelectronics ReliabilityCitation Excerpt :Thereby, there are no efficient ECC solutions that can be employed straightforwardly for STT-MRAM to be used as a universal memory. This paper then proposes a low-cost error correction circuit derived from Orthogonal Latin Square Code (OLSC) [14,15], which is implemented in a built-in form and can be employed as an inner code for STT-MRAM. STT-MRAM is based on the magnetization programming of the magnetic tunnel junction (MTJ) (see Fig. 2 (a)), which is mainly composed of one oxide barrier layer (e.g. MgO) sandwiched between two ferromagnetic (FM) layers (e.g. CoFeB).
An Improved (24, 16) OLS Code for Single Error Correction-Double Adjacent Error Correction-Triple Adjacent Error Correction
2023, Lecture Notes in Networks and SystemsDesign of Power Efficient SEC Orthogonal Latin Square (OLS) Codes
2021, Lecture Notes in Networks and SystemsScalable crossbar network: a non-blocking interconnection network for large-scale systems
2015, Journal of Supercomputing