Abstract:
Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multip...Show MoreMetadata
Abstract:
Reliability and manufacturability have emerged as dominant concerns for today's multi-billion transistor chips. In this paper, we investigate how to degrade a chip multiprocessor (CMP) gracefully in presence of faults, by keeping its architected functionality intact at the expense of some loss of performance. The proposed solution involves sharing critical execution resources among cores to survive faults. Recent research has suggested that large datapath units such as FPU and integer division units are good candidates for execution outsourcing to other working cores in CMP. In this paper, we focus on relatively small but critically important integer ALU unit. Outsourcing ALU operations incur large performance penalty and better solutions need to be in place to ensure survivability with minimal performance loss. We propose the provisioning of a shared ALU among a set of cores that can act as a spare for any constituent core in the group. This solution works well for single ALU failures, but leads to resource contention when multiple ALUs fail. Simulation case studies on MediaBench and MiBench benchmarks show that the proposed solution allows the CMP to remain functionally intact with no performance penalty for single ALU failures and no more than 1.5% performance loss on average for failure of single ALU in each core.
Published in: 2011 IEEE 17th International On-Line Testing Symposium
Date of Conference: 13-15 July 2011
Date Added to IEEE Xplore: 22 August 2011
ISBN Information: