On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units | IEEE Conference Publication | IEEE Xplore

On graceful degradation of chip multiprocessors in presence of faults via flexible pooling of critical execution units


Abstract:

Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multip...Show More

Abstract:

Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves sharing critical execution resources among cores to survive faults. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit. Outsourcing ALU operations incur large performance penalty and better solutions need to be in place to ensure survivability with minimal performance loss. We propose the provisioning of a shared ALU among a set of cores that can act as a spare for any constituent core in the group. This solution works well for single ALU failures, but leads to resource contention when multiple ALUs fail. Simulation case studies on MediaBench and MiBench benchmarks show that the proposed solution allows the CMP to remain functionally intact with no performance penalty for single ALU failures and no more than 1.5% performance loss on average for failure of single ALU in each core.
Date of Conference: 13-15 July 2011
Date Added to IEEE Xplore: 22 August 2011
ISBN Information:

ISSN Information:

Conference Location: Athens, Greece

Contact IEEE to Subscribe

References

References is not available for this document.