Loading [a11y]/accessibility-menu.js
Demonstrating HW–SW Transient Error Mitigation on the Single-Chip Cloud Computer Data Plane | IEEE Journals & Magazine | IEEE Xplore

Demonstrating HW–SW Transient Error Mitigation on the Single-Chip Cloud Computer Data Plane


Abstract:

Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without e...Show More

Abstract:

Transient errors are a major concern for the correct operation of low-level cache memories. Aggressive integration requires effective mitigation of such errors, without extreme overheads in power, timing, or silicon area. We demonstrate a hybrid (hardware-software) scheme that mitigates bit flips in data that reside in low-level caches. The methodology is shown to be applicable in streaming applications and we illustrate that with a video decoding case study on a state-of-the-art many-core chip. The single-chip cloud computer is an experimental processor created by Intel Labs. Dedicated on-chip memories are utilized to keep safe copies for key application data, thus allowing rollbacks upon error detection. The experimental results illustrate the tradeoff between application delay, consumed energy, and output fidelity as the injected errors are corrected. When output fidelity is considered as a hard constraint, application slack used for mitigation can be reclaimed with dynamic frequency scaling. Output fidelity is guaranteed regardless of the error injection intensity and the application's timing constraints are respected up to a certain upper bound of error injection.
Page(s): 507 - 519
Date of Publication: 07 April 2014

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.