Abstract:
Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without e...Show MoreMetadata
Abstract:
Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without extreme overheads in power, timing, or silicon area. We demonstrate a hybrid (hardware-software) scheme that mitigates bit flips in data that reside in low-level caches. The methodology is shown to be applicable in streaming applications and we illustrate that with a video decoding case study on a state-of-the-art many-core chip. The single-chip cloud computer is an experimental processor created by Intel Labs. Dedicated on-chip memories are utilized to keep safe copies for key application data, thus allowing rollbacks upon error detection. The experimental results illustrate the tradeoff between application delay, consumed energy, and output fidelity as the injected errors are corrected. When output fidelity is considered as a hard constraint, application slack used for mitigation can be reclaimed with dynamic frequency scaling. Output fidelity is guaranteed regardless of the error injection intensity and the application's timing constraints are respected up to a certain upper bound of error injection.
Published in: IEEE Transactions on Very Large Scale Integration (VLSI) Systems ( Volume: 23, Issue: 3, March 2015)