M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures | IEEE Journals & Magazine | IEEE Xplore

M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures


Abstract:

Coarse-grained reconfigurable architectures (CGRAs) can provide both energy efficiency and performance for embedded systems, and thus they are increasingly deployed in th...Show More

Abstract:

Coarse-grained reconfigurable architectures (CGRAs) can provide both energy efficiency and performance for embedded systems, and thus they are increasingly deployed in the areas of aerospace, automotive engineering, and security where reliability is also a main criterion. However, the state-of-the-art fault-tolerant strategies for CGRAs apply either temporal or spatial scheme, including redundancy, periodic detection, workload balancing, and reconfiguration, failing to exploit the feature of dynamic and partial reconfiguration of CGRAs. Also, vulnerable judging circuits and inflexible mode shifting bottleneck the reliability design of fault-tolerant CGRAs. This article proposes a novel multimode fault-tolerant framework for CGRAs, which combines spatial-redundant data paths with temporal-redundant voters and thus reduces the vulnerable judging circuits while balancing the performance and reliability. This framework can also enable a changing reliability level at runtime via an online configuration transformation method based on precompiled patterns. Within the proposed framework, we systematically searched the design space spanning various combinations of the mainstream schemes with a Markov process model to compare the effectiveness and accordingly selected five points as available modes in our design after comprehensive consideration of fault tolerance and time overhead on CGRA. The framework is comprehensively evaluated on a cycle-accurate CGRA simulator, considering both permanent and transient faults. The experimental results show that the fault coverage rate of single transient faults or permanent faults has increased from 71.74% to 93.84%, which means the fault tolerance of the system has been increased by 31.03% compared with the state-of-the-art methods. There is also a great improvement in mean-time-to-failure (MTTF) and reconfiguration latency over baseline designs.
Page(s): 2938 - 2951
Date of Publication: 24 January 2023

ISSN Information:

Funding Agency:


References

References is not available for this document.