skip to main content
10.1145/1878961.1878979acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
research-article

OE+IOE: a novel turn model based fault tolerant routing scheme for networks-on-chip

Published: 24 October 2010 Publication History

Abstract

Network-on-chip (NoC) communication architectures are increasingly being used today to interconnect cores on chip multiprocessors (CMPs). Permanent faults in NoCs due to fabrication challenges in ultra deep submicron (UDSM) technology nodes and due to wearout have led to an increased emphasis on fault tolerant design techniques. To ensure fault tolerant communication in NoCs, several fault tolerant routing algorithms have been proposed in recent years with the goal of routing flits around faults. A majority of these algorithms are based on the turn model approach due to its simplicity and inherent freedom from deadlock. However, existing turn model based fault tolerant routing algorithms are either too restrictive in the choice of paths that flits can traverse, or are tailored to work efficiently only on very specific fault distribution patterns. In this paper, we propose a novel low overhead fault tolerant routing scheme that combines the odd-even (OE) and inverted odd-even (IOE) turn models to achieve much better fault tolerance than traditional turn model based schemes. The proposed scheme uses replication opportunistically to optimize the balance between energy overhead and arrival rate. Our experimental results indicate that the proposed OE+IOE routing scheme provides better fault tolerance than existing turn model, N-random walk, and dual virtual channel based routing schemes that have been proposed in literature.

References

[1]
S. S. Mukherjee, J. Emer, S. K. Reinhardt, "The soft error problem: An architectural perspective," Proc. HPCA, 2005.
[2]
E. Normand, "Single event upset at ground level," IEEE Trans. on Nuclear Science, 43(6):2742--2750, Dec 1996.
[3]
C. Constantinescu, "Trends and challenges in VLSI circuit reliability," IEEE Micro, 23(4):14--19, July-Aug 2003.
[4]
S. Nassif, "Modeling and analysis of manufacturing variations," Proc. CICC, May 2001.
[5]
C. Constantinescu, "Intermittent faults in VLSI circuits," Proc. SELSE, 2007.
[6]
S. Pasricha and N. Dutt, On-Chip Communication Architectures, Morgan Kauffman, Apr 2008.
[7]
L. Benini and G.D. Micheli, "Networks on chips: a new SoC paradigm," IEEE Computer, pp. 70--78, Jan. 2002.
[8]
W. J. Dally, B. Towles, "Route packets, not wires: on-chip interconnection networks," Proc. DAC, pp. 684--689, 2001.
[9]
D.C. Pham, et al., "Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor," IEEE J. Solid-State Circuits, 41(1):179--196, 2006
[10]
Intel Teraflops, http://download.intel.com/research/platform/ terascale/terascale_overview_paper.pdf.
[11]
Picochip PC102. http://www.picochip.com/highlights/pc102.
[12]
S. Bell et al., "TILE64 processor: A 64-core SoC with mesh interconnect," Proc. ISSCC, 2008.
[13]
D. Bertozzi, L. Benini, G. De Micheli, "Error control schemes for on-chip communication links: the energy-reliability tradeoff," IEEE Trans. CAD, 24(6):818--831, 2005.
[14]
S. Murali et al., "Analysis of error recovery schemes for networks on chips," IEEE Design & Test of Computers, 22(5): 434--442, 2005.
[15]
S. Lin and D. J. Costello, Error control coding: fundamentals and applications, Englewood Cliffs, NJ: Prentice-Hall, 1983.
[16]
D. Bertozzi, L. Benini, G. De Micheli, "Low power error resilient encoding for on-chip data buses," Proc. DATE, pp. 102--109, 2002.
[17]
M. Lajolo, "Bus guardians: an effective solution for online detection and correction of faults affecting system-on-chip buses," IEEE Trans. VLSI, 9(6):974--982, Dec. 2001
[18]
W.J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Morgan Kauffman, 2004.
[19]
M. Dehyadgari, M. Nickray, A. Afzali-kusha, Z. Navabi, "Evaluation of pseudo adaptive XY routing using an object oriented model for NoC," Proc. MICRO, pp. 13--15, 2005.
[20]
H. Zhu, P. P. Pande, C. Grecu, "Performance evaluation of adaptive routing algorithms for achieving fault tolerance in NoC fabrics," Proc. ASAP 2007.
[21]
T. Dumitras, R. Marculescu, "On-chip stochastic communication," Proc. DATE, 2003.
[22]
M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, "Fault tolerant algorithms for network-on-chip interconnect," Proc. ISVLSI, 2004.
[23]
Y. B. Kim, Y.-B. Kim, "Fault tolerant source routing for network-on-chip," Proc. DFT, 2007.
[24]
T. Schonwald, J. Zimmermann, O. Bringmann, and W. Rosenstiel, "Fully adaptive fault tolerant routing algorithm for network-on-chip architectures," Proc. DSD, pp. 527--534, Aug. 2007.
[25]
T.Schonwald, O.Bringmann, W.Rosenstiel, "Region-based routing algorithm for network-on-chip architectures," Proc. Norchip 2007.
[26]
C. J. Glass, L. M. Ni, "The turn model for adaptive routing," Proc. ISCA, pp. 278--287, 1992.
[27]
C. J. Glass and L. M. Ni, "Fault-tolerant wormhole routing in meshes without virtual channels," IEEE Trans. Parallel and Distributed Systems, 7(6):620--635, 1996.
[28]
C. M. Cunningham, D. R. Avresky, "Fault-tolerant adaptive routing for two-dimensional meshes," Proc. HPCA, 1995.
[29]
G.-M. Chiu, "The odd-even turn model for adaptive routing," IEEE Trans. Parallel and Distributed Systems, 11(7), pp.729--738, 2000.
[30]
A. Patooghy, S. G. Miremadi, "XYX: A power & performance efficient fault-tolerant routing algorithm for network on chip," Proc. ICPDNP, pp. 245--251, 2009
[31]
D. Fick, A. DeOrio, G. K. Chen, V. Bertacco, D. Sylvester, D. Blaauw, "A highly resilient routing algorithm for fault-tolerant NoCs," Proc. DATE, 2009.
[32]
R.V. Boppana and S. Chalasani, "Fault-tolerant wormhole routing algorithms for mesh networks," IEEE Trans. on Computers, 44(7):848--864, 1995.
[33]
J. Wu, "A fault-tolerant and deadlock-free routing protocol in 2D meshes based on odd-even turn model," IEEE Trans. on Computers, 52(9):1154--1169, Sep 2003.
[34]
A. Rezazadeh, M. Fathy, A. Hassanzadeh, "If-cube3: An improved fault-tolerant routing algorithm to achieve less latency in NoCs," Proc. IACC, pp. 278--283, 2009.
[35]
S. Jovanovic, C. Tanougast, S. Weber, C. Bobda, "A new deadlock-free fault-tolerant routing algorithm for NoC interconnections," Proc. FPLA, pp. 326--331, 2009.
[36]
M. Andres, P. Maurizio, F. José, "Region-Based Routing: A Mechanism to Support Efficient Routing Algorithms in NoCs," IEEE Trans. VLSI, 17(3):356--369, 2009 .
[37]
J. Hu, R. Marculescu, "Dyad - smart routing for networks-on-chip," Proc. DAC, 2004.
[38]
Z. Zhang, A. Greiner, S. Taktak, "A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip," Proc. DAC, 2008.
[39]
W.-C. Kwon, S. Yoo, J. Um, S.-W. Jeong, "In-network reorder buffer to improve overall NoC performance while resolving the in-order requirement problem," Proc. DATE, pp. 1058--1063, 2009.
[40]
M. Koibuchi, H. Matsutani, H. Amano, T. M. Pinkston, "A lightweight fault-tolerant mechanism for network-on-chip", Proc. NOCS 2008.
[41]
SystemC initiative. http://www.systemc.org.
[42]
Nirgam simulator http://nirgam.ecs.soton.ac.uk/.
[43]
A. Kahng, B. Li, L.-S. Peh and K. Samadi, "ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration," Proc. DATE, 2009.

Cited By

View all
  • (2024)ReD: A Reliable and Deadlock-Free Routing for 2.5-D Chiplet-Based Interposer NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339966043:12(4599-4612)Online publication date: Dec-2024
  • (2023)A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-ChipIEEE Design & Test10.1109/MDAT.2023.330671940:6(76-85)Online publication date: Dec-2023
  • (2018)LBFTThe Journal of Supercomputing10.1007/s11227-016-1935-074:8(3726-3747)Online publication date: 1-Aug-2018
  • Show More Cited By

Index Terms

  1. OE+IOE: a novel turn model based fault tolerant routing scheme for networks-on-chip

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CODES/ISSS '10: Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
    October 2010
    348 pages
    ISBN:9781605589053
    DOI:10.1145/1878961
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • CEDA
    • IEEE CAS
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 October 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fault-tolerant routing
    2. networks-on-chip

    Qualifiers

    • Research-article

    Conference

    ESWeek '10
    ESWeek '10: Sixth Embedded Systems Week
    October 24 - 29, 2010
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 280 of 864 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)333
    • Downloads (Last 6 weeks)35
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ReD: A Reliable and Deadlock-Free Routing for 2.5-D Chiplet-Based Interposer NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.339966043:12(4599-4612)Online publication date: Dec-2024
    • (2023)A Reinforcement Learning Framework With Region-Awareness and Shared Path Experience for Efficient Routing in Networks-on-ChipIEEE Design & Test10.1109/MDAT.2023.330671940:6(76-85)Online publication date: Dec-2023
    • (2018)LBFTThe Journal of Supercomputing10.1007/s11227-016-1935-074:8(3726-3747)Online publication date: 1-Aug-2018
    • (2017)Bio-inspired fault tolerant network on chipIntegration10.1016/j.vlsi.2017.04.00458(155-166)Online publication date: Jun-2017
    • (2017)Conclusions and Future WorkInvasive Computing for Mapping Parallel Programs to Many-Core Architectures10.1007/978-981-10-7356-4_7(157-161)Online publication date: 30-Dec-2017
    • (2015)Fault-Tolerant Dynamic Adaptive Routing in NoCi-manager's Journal on Computer Science10.26634/jcom.2.4.33292:4(12-17)Online publication date: 15-Feb-2015
    • (2014)HEFTProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656087(1-10)Online publication date: 12-Oct-2014
    • (2014)SPMCloudACM Transactions on Design Automation of Electronic Systems10.1145/261175519:3(1-45)Online publication date: 23-Jun-2014
    • (2014)A low overhead, fault tolerant and congestion aware routing algorithm for 3D mesh-based Network-on-ChipsMicroprocessors & Microsystems10.1016/j.micpro.2014.09.00538:8(991-999)Online publication date: 1-Nov-2014
    • (2013)Fault Diagnosis and Reconfiguration Method for Network-on-Chip Based Multiple Processor Systems with Restricted Private MemoriesIEICE Transactions on Information and Systems10.1587/transinf.E96.D.1914E96.D:9(1914-1925)Online publication date: 2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media