skip to main content
10.1145/2660540.2660986acmconferencesArticle/Chapter ViewAbstractPublication PagessbcciConference Proceedingsconference-collections
tutorial

A Fast Runtime Fault Recovery Approach for NoC-Based MPSoCS for Performance Constrained Applications

Published: 01 September 2014 Publication History

Abstract

Mechanisms for runtime fault-tolerance in Multi-Processor System-on-Chips (MPSoCs) are mandatory to cope with transient and permanent faults. This issue is even more relevant in nanotechnologies due to process variability, aging effects, and susceptibility to upsets, among other factors. The literature presents isolated solutions to deal with faults in the MPSoC communication infrastructure. In this context, one gap to be fulfilled is to integrate all layers, resulting in a solution to cope with NoC faults from the physical layer up to the application layer. The goal of this work is to present a runtime integrated approach to cope with NoC faults in MPSoCs. The original contribution is the proposal of a set of hardware and software mechanisms to ensure both efficient and reliable communication in NoC-based MPSoCs. The proposal has an acceptable silicon area overhead and a small memory footprint. Experiments demonstrate that benchmarks (synthetic and real MPSoC applications) were simulated with thousands of random fault injections, and all of them were executed correctly. Moreover, the average application execution time overhead is lower than 0.5%. This suggests the proposed fault tolerant method could be used in applications with reliability and performance constraints.

References

[1]
Borkar, S. Thousand Core Chips - A Technology Perspective. In: DAC, 2007, pp.746--749.
[2]
Rodrigues, R.; Kundu, S. On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units. In: IOLTS, 2011, pp. 67--72.
[3]
Radetzki, M.; Feng, C.; Zhao, X.; Jantsch, A. Methods for Fault Tolerance in Networks on Chip. ACM Computing Surveys, v.46(1), 2013, article No. 8.
[4]
Veiga, F.; Zeferino, C. Implementation of Techniques for Fault Tolerance in a Network-on-Chip. In: WSCAD-SCC, 2010, pp. 80--87.
[5]
Ghosal, P.; Das, T. FL2STAR: A Novel Topology for On-Chip-Routing in NoC with Fault Tolerance and Deadlock Prevention. In: CONECCT, 2013, 6p.
[6]
Chang, Y.; Chiu, C; Liu, S.; Liu, C. On the Design and Analysis of Fault Tolerant NoC Architecture Using Spare Routers. In: ASP-DAC, 2011, pp. 431--436.
[7]
Tsai, W.; Zheng, D.; Chen, S.; Hu, Y. A Fault-Tolerant NoC Scheme Using Bidirectional Channel. In: DAC, 2011. pp. 918--923.
[8]
Fick, D.; DeOrio, A.; Jin Hu; Bertacco, V.; Blaauw, D.; Sylvester, D. Vicis: A Reliable Network for Unreliable Silicon. In: DAC, 2009, pp. 812--817.
[9]
Rodrigo, S.; Flich, J.; Roca, A.; Medardoni, S.; Bertozzi, D.; Camacho, J.; Silla, F.; Duato, J. Cost-Efficient On-Chip Routing Implementations for CMP and MPSoC Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, v. 30(4), 2011, pp. 534--547.
[10]
Alhussien, A.; Bagherzadeh, N.; Verbeek, F.; van Gastel, B.; Schmaltz, J. A formally verified deadlock-free routing function in a fault-tolerant NoC architecture. In: SBCCI, 2012, 6p.
[11]
Aulwes, R.; Daniel, D.; Desai, N.; Graham, R.; Risinger, L.; Taylor, M.; Woodall, T.; Sukalski, M. Architecture of LA-MPI, a Network-Fault-Tolerant MPI. In: IPDPS, 2004, 6p.
[12]
Kariniemi, H.; Nurmi, J. Fault-Tolerant Communication over Micronmesh NOC with Micron Message-Passing Protocol. In: SOC, 2009, pp. 5--12.
[13]
Hébert, N.; Almeida, G.; Benoit, P.; Sassatelli, G.; Torres, L. Evaluation of a Distributed Fault Handler Method for MPSoC. In: ISCAS 2011, pp. 2329--2332.
[14]
Garibotti, R.; Ost, L.; Busseuil, R.; Kourouma, M.; Adeniyi-Jones, C.; Sassatelli, G.; Robert, M. Simultaneous Multithreading Support in Embedded Distributed Memory MPSoCs. In: DAC, 2013, 7p.
[15]
Mandelli, M.; Ost, L.; Carara, E.; Guindani, G.; Rosa, T.; Medeiros, G.; Moraes, F. Energy-Aware Dynamic Task Mapping for NoC-based MPSoCs. In: ISCAS, 2011, pp. 1676--1679.
[16]
Agarwal, A.; Iskander, C.; Shankar, R. Survey of Network on Chip (NoC) Architectures & Contributions. Journal of Engineering, Computing and Architecture, v.3(1), 2009.
[17]
Young J.; Concer, N.; Petracca, M.; Carloni, L. Virtual Channels vs. Multiple Physical Networks: A Comparative Analysis. In: DAC, 2010, pp. 162--165.
[18]
Murali, S.; Theocharides, T.; Vijaykrishnan, N.; Irwin, M.J.; Benini, L.; De Micheli, G. Analysis of Error Recovery Schemes for Networks on Chips. IEEE Design and Test of Computers, v.22(5), 2005, pp. 434--442.
[19]
Wachter, E.; Erichsen, A.; Juracy, L.; Amory, A.; Moraes, F. Runtime fault recovery protocol for NoC-based MPSoCs. In: ISQED, 2014, pp.132--139.

Cited By

View all
  • (2018)A Hierarchical and Distributed Fault Tolerant Proposal for NoC-Based MPSoCsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2016.25936406:4(524-537)Online publication date: 1-Oct-2018

Index Terms

  1. A Fast Runtime Fault Recovery Approach for NoC-Based MPSoCS for Performance Constrained Applications

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SBCCI '14: Proceedings of the 27th Symposium on Integrated Circuits and Systems Design
      September 2014
      286 pages
      ISBN:9781450331562
      DOI:10.1145/2660540
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 September 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. NoC-based MPSoC
      2. fault recovery
      3. fault-tolerant NoCs
      4. fault-tolerant communication

      Qualifiers

      • Tutorial
      • Research
      • Refereed limited

      Conference

      SBCCI '14
      Sponsor:

      Acceptance Rates

      SBCCI '14 Paper Acceptance Rate 40 of 130 submissions, 31%;
      Overall Acceptance Rate 133 of 347 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2018)A Hierarchical and Distributed Fault Tolerant Proposal for NoC-Based MPSoCsIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2016.25936406:4(524-537)Online publication date: 1-Oct-2018

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media