research-article

OE+IOE: a novel turn model based fault tolerant routing scheme for networks-on-chip

Authors:

Sudeep Pasricha,

Howard Jay SiegelAuthors Info & Claims

CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Pages 85 - 94

https://doi.org/10.1145/1878961.1878979

Published: 24 October 2010 Publication History

Abstract

Network-on-chip (NoC) communication architectures are increasingly being used today to interconnect cores on chip multiprocessors (CMPs). Permanent faults in NoCs due to fabrication challenges in ultra deep submicron (UDSM) technology nodes and due to wearout have led to an increased emphasis on fault tolerant design techniques. To ensure fault tolerant communication in NoCs, several fault tolerant routing algorithms have been proposed in recent years with the goal of routing flits around faults. A majority of these algorithms are based on the turn model approach due to its simplicity and inherent freedom from deadlock. However, existing turn model based fault tolerant routing algorithms are either too restrictive in the choice of paths that flits can traverse, or are tailored to work efficiently only on very specific fault distribution patterns. In this paper, we propose a novel low overhead fault tolerant routing scheme that combines the odd-even (OE) and inverted odd-even (IOE) turn models to achieve much better fault tolerance than traditional turn model based schemes. The proposed scheme uses replication opportunistically to optimize the balance between energy overhead and arrival rate. Our experimental results indicate that the proposed OE+IOE routing scheme provides better fault tolerance than existing turn model, N-random walk, and dual virtual channel based routing schemes that have been proposed in literature.

References

[1]

S. S. Mukherjee, J. Emer, S. K. Reinhardt, "The soft error problem: An architectural perspective," Proc. HPCA, 2005.

Digital Library

[2]

E. Normand, "Single event upset at ground level," IEEE Trans. on Nuclear Science, 43(6):2742--2750, Dec 1996.

[3]

C. Constantinescu, "Trends and challenges in VLSI circuit reliability," IEEE Micro, 23(4):14--19, July-Aug 2003.

Digital Library

[4]

S. Nassif, "Modeling and analysis of manufacturing variations," Proc. CICC, May 2001.

[5]

C. Constantinescu, "Intermittent faults in VLSI circuits," Proc. SELSE, 2007.

[6]

S. Pasricha and N. Dutt, On-Chip Communication Architectures, Morgan Kauffman, Apr 2008.

Digital Library

[7]

L. Benini and G.D. Micheli, "Networks on chips: a new SoC paradigm," IEEE Computer, pp. 70--78, Jan. 2002.

Digital Library

[8]

W. J. Dally, B. Towles, "Route packets, not wires: on-chip interconnection networks," Proc. DAC, pp. 684--689, 2001.

Digital Library

[9]

D.C. Pham, et al., "Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor," IEEE J. Solid-State Circuits, 41(1):179--196, 2006

[10]

Intel Teraflops, http://download.intel.com/research/platform/ terascale/terascale_overview_paper.pdf.

[11]

Picochip PC102. http://www.picochip.com/highlights/pc102.

[12]

S. Bell et al., "TILE64 processor: A 64-core SoC with mesh interconnect," Proc. ISSCC, 2008.

[13]

D. Bertozzi, L. Benini, G. De Micheli, "Error control schemes for on-chip communication links: the energy-reliability tradeoff," IEEE Trans. CAD, 24(6):818--831, 2005.

Digital Library

[14]

S. Murali et al., "Analysis of error recovery schemes for networks on chips," IEEE Design & Test of Computers, 22(5): 434--442, 2005.

Digital Library

[15]

S. Lin and D. J. Costello, Error control coding: fundamentals and applications, Englewood Cliffs, NJ: Prentice-Hall, 1983.

[16]

D. Bertozzi, L. Benini, G. De Micheli, "Low power error resilient encoding for on-chip data buses," Proc. DATE, pp. 102--109, 2002.

Digital Library

[17]

M. Lajolo, "Bus guardians: an effective solution for online detection and correction of faults affecting system-on-chip buses," IEEE Trans. VLSI, 9(6):974--982, Dec. 2001

Digital Library

[18]

W.J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kauffman, 2004.

Digital Library

[19]

M. Dehyadgari, M. Nickray, A. Afzali-kusha, Z. Navabi, "Evaluation of pseudo adaptive XY routing using an object oriented model for NoC," Proc. MICRO, pp. 13--15, 2005.

[20]

H. Zhu, P. P. Pande, C. Grecu, "Performance evaluation of adaptive routing algorithms for achieving fault tolerance in NoC fabrics," Proc. ASAP 2007.

[21]

T. Dumitras, R. Marculescu, "On-chip stochastic communication," Proc. DATE, 2003.

Digital Library

[22]

M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, "Fault tolerant algorithms for network-on-chip interconnect," Proc. ISVLSI, 2004.

[23]

Y. B. Kim, Y.-B. Kim, "Fault tolerant source routing for network-on-chip," Proc. DFT, 2007.

Digital Library

[24]

T. Schonwald, J. Zimmermann, O. Bringmann, and W. Rosenstiel, "Fully adaptive fault tolerant routing algorithm for network-on-chip architectures," Proc. DSD, pp. 527--534, Aug. 2007.

Digital Library

[25]

T.Schonwald, O.Bringmann, W.Rosenstiel, "Region-based routing algorithm for network-on-chip architectures," Proc. Norchip 2007.

[26]

C. J. Glass, L. M. Ni, "The turn model for adaptive routing," Proc. ISCA, pp. 278--287, 1992.

Digital Library

[27]

C. J. Glass and L. M. Ni, "Fault-tolerant wormhole routing in meshes without virtual channels," IEEE Trans. Parallel and Distributed Systems, 7(6):620--635, 1996.

Digital Library

[28]

C. M. Cunningham, D. R. Avresky, "Fault-tolerant adaptive routing for two-dimensional meshes," Proc. HPCA, 1995.

Digital Library

[29]

G.-M. Chiu, "The odd-even turn model for adaptive routing," IEEE Trans. Parallel and Distributed Systems, 11(7), pp.729--738, 2000.

Digital Library

[30]

A. Patooghy, S. G. Miremadi, "XYX: A power & performance efficient fault-tolerant routing algorithm for network on chip," Proc. ICPDNP, pp. 245--251, 2009

Digital Library

[31]

D. Fick, A. DeOrio, G. K. Chen, V. Bertacco, D. Sylvester, D. Blaauw, "A highly resilient routing algorithm for fault-tolerant NoCs," Proc. DATE, 2009.

Digital Library

[32]

R.V. Boppana and S. Chalasani, "Fault-tolerant wormhole routing algorithms for mesh networks," IEEE Trans. on Computers, 44(7):848--864, 1995.

Digital Library

[33]

J. Wu, "A fault-tolerant and deadlock-free routing protocol in 2D meshes based on odd-even turn model," IEEE Trans. on Computers, 52(9):1154--1169, Sep 2003.

Digital Library

[34]

A. Rezazadeh, M. Fathy, A. Hassanzadeh, "If-cube3: An improved fault-tolerant routing algorithm to achieve less latency in NoCs," Proc. IACC, pp. 278--283, 2009.

[35]

S. Jovanovic, C. Tanougast, S. Weber, C. Bobda, "A new deadlock-free fault-tolerant routing algorithm for NoC interconnections," Proc. FPLA, pp. 326--331, 2009.

[36]

M. Andres, P. Maurizio, F. José, "Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs," IEEE Trans. VLSI, 17(3):356--369, 2009 .

Digital Library

[37]

J. Hu, R. Marculescu, "Dyad - smart routing for networks-on-chip," Proc. DAC, 2004.

Digital Library

[38]

Z. Zhang, A. Greiner, S. Taktak, "A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip," Proc. DAC, 2008.

Digital Library

[39]

W.-C. Kwon, S. Yoo, J. Um, S.-W. Jeong, "In-network reorder buffer to improve overall NoC performance while resolving the in-order requirement problem," Proc. DATE, pp. 1058--1063, 2009.

Digital Library

[40]

M. Koibuchi, H. Matsutani, H. Amano, T. M. Pinkston, "A lightweight fault-tolerant mechanism for network-on-chip", Proc. NOCS 2008.

Digital Library

[41]

SystemC initiative. http://www.systemc.org.

[42]

Nirgam simulator http://nirgam.ecs.soton.ac.uk/.

[43]

A. Kahng, B. Li, L.-S. Peh and K. Samadi, "ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration," Proc. DATE, 2009.

Digital Library

Cited By

Taheri EPasricha SNikdast M(2024)ReD: A Reliable and Deadlock-Free Routing for 2.5-D Chiplet-Based Interposer NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339966043:12(4599-4612)Online publication date: Dec-2024
https://doi.org/10.1109/TCAD.2024.3399660
Khan KPasricha S(2023)A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-ChipIEEE Design & Test10.1109/MDAT.2023.330671940:6(76-85)Online publication date: Dec-2023
https://doi.org/10.1109/MDAT.2023.3306719
Xie RCai JXin XYang B(2018)LBFTThe Journal of Supercomputing10.1007/s11227-016-1935-074:8(3726-3747)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11227-016-1935-0
Show More Cited By

Index Terms

OE+IOE: a novel turn model based fault tolerant routing scheme for networks-on-chip
1. Hardware
  1. Very large scale integration design
    1. VLSI system specification and constraints

Recommendations

Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks

We present simple methods to enhance the current minimal wormhole routing algorithms developed for high-radix, low-dimensional mesh networks for fault-tolerant routing. We consider arbitrarily-located faulty blocks and assume only local knowledge of ...
Adaptive Fault-Tolerant Deadlock-Free Routing in Meshes and Hypercubes

We present an adaptive deadlock-free routing algorithm which decomposes a given network into two virtual interconnection networks, VIN1 and VIN2. VIN1 supports deterministic deadlock-free routing, and VIN2 supports fully-adaptive routing. Whenever a ...
Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip
NOCS '12: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip

High-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

October 2010

348 pages

ISBN:9781605589053

DOI:10.1145/1878961

Program Chairs:
Tony Givargis
University of California, Irvine, CA
,
Adam Donlin
Xilinx, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

CEDA
IEEE CAS
IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWeek '10

Sponsor:

ESWeek '10: Sixth Embedded Systems Week

October 24 - 29, 2010

Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
1,299
Total Downloads

Downloads (Last 12 months)333
Downloads (Last 6 weeks)35

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Taheri EPasricha SNikdast M(2024)ReD: A Reliable and Deadlock-Free Routing for 2.5-D Chiplet-Based Interposer NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339966043:12(4599-4612)Online publication date: Dec-2024
https://doi.org/10.1109/TCAD.2024.3399660
Khan KPasricha S(2023)A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-ChipIEEE Design & Test10.1109/MDAT.2023.330671940:6(76-85)Online publication date: Dec-2023
https://doi.org/10.1109/MDAT.2023.3306719
Xie RCai JXin XYang B(2018)LBFTThe Journal of Supercomputing10.1007/s11227-016-1935-074:8(3726-3747)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1007/s11227-016-1935-0
Sethi MHussin FHamid N(2017)Bio-inspired fault tolerant network on chipIntegration10.1016/j.vlsi.2017.04.00458(155-166)Online publication date: Jun-2017
https://doi.org/10.1016/j.vlsi.2017.04.004
Weichslgartner AWildermann SGlaß MTeich JWeichslgartner AWildermann SGlaß MTeich J(2017)Conclusions and Future WorkInvasive Computing for Mapping Parallel Programs to Many-Core Architectures10.1007/978-981-10-7356-4_7(157-161)Online publication date: 30-Dec-2017
https://doi.org/10.1007/978-981-10-7356-4_7
S.A ANayagi. P A(2015)Fault-Tolerant Dynamic Adaptive Routing in NoCi-manager's Journal on Computer Science10.26634/jcom.2.4.33292:4(12-17)Online publication date: 15-Feb-2015
https://doi.org/10.26634/jcom.2.4.3329
Zou YPasricha SMarculescu RNicolescu G(2014)HEFTProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656087(1-10)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656075.2656087
Bathen LDutt N(2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2611755
Naghibi Jouybari HMohammadi K(2014)A low overhead, fault tolerant and congestion aware routing algorithm for 3D mesh-based Network-on-ChipsMicroprocessors & Microsystems10.1016/j.micpro.2014.09.00538:8(991-999)Online publication date: 1-Nov-2014
https://dl.acm.org/doi/10.1016/j.micpro.2014.09.005
IMAI MYONEDA T(2013)Fault Diagnosis and Reconfiguration Method for Network-on-Chip Based Multiple Processor Systems with Restricted Private MemoriesIEICE Transactions on Information and Systems10.1587/transinf.E96.D.1914E96.D:9(1914-1925)Online publication date: 2013
https://doi.org/10.1587/transinf.E96.D.1914
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten