Abstract:
Fault tolerant design has a key role in current nanometric technologies, leading to research on fault mitigation techniques for NoC-based MPSoCs. Most of the state-of-the...Show MoreMetadata
Abstract:
Fault tolerant design has a key role in current nanometric technologies, leading to research on fault mitigation techniques for NoC-based MPSoCs. Most of the state-of-the-art papers present partial solutions to design a fault tolerant MPSoC, i.e., they present fault tolerant mechanisms for either NoCs or processing elements (PEs). The goal of this paper is to propose a comprehensive set of recovery mechanisms, organized in a layered stack, ensuring the correct execution of applications in the presence of transient or permanent faults, for both NoC and PEs. Faults injected into the NoC may induce it to operate in degraded mode or require the search of fault-free paths. In both cases, the communication is reestablished in less than 50 microseconds, using an end-to-end recovery mechanism. Faults injected into the PEs fire a lightweight and fast task relocation protocol, which executes in less than one millisecond.
Published in: 2016 17th Latin-American Test Symposium (LATS)
Date of Conference: 06-08 April 2016
Date Added to IEEE Xplore: 02 June 2016
Electronic ISBN:978-1-5090-1331-9