Loading [MathJax]/extensions/MathMenu.js
Highly-reliable integer matrix multiplication via numerical packing | IEEE Conference Publication | IEEE Xplore

Highly-reliable integer matrix multiplication via numerical packing


Abstract:

The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems...Show More

Abstract:

The generic matrix multiply (GEMM) routine comprises the compute and memory-intensive part of many information retrieval, relevance ranking and object recognition systems. Because of the prevalence of GEMM in these applications, ensuring its robustness to transient hardware faults is of paramount importance for highly-efficientlhighly-reliable systems. This is currently accomplished via error control coding (ECC) or via dual modular redundancy (DMR) approaches that produce a separate set of “parity” results to allow for fault detection in GEMM. We introduce a third family of methods for fault detection in integer matrix products based on the concept of numerical packing. The key difference of the new approach against ECC and DMR approaches is the production of redundant results within the numerical representation of the inputs rather than as a separate set of parity results. In this way, high reliability is ensured within integer matrix products while allowing for: (i) in-place storage; (ii) usage of any off-the-shelf 64-bit floating-point GEMM routine; (iii) computational overhead that is independent of the GEMM inner dimension. The only detriment against a conventional (i.e. fault-intolerant) integer matrix multiplication based on 32-bit floating-point GEMM is the sacrifice of approximately 30.6% of the bitwidth of the numerical representation. However, unlike ECC methods that can reliably detect only up to a few faults per GEMM computation (typically two), the proposed method attains more than “12 nines” reliability, i.e. it will only fail to detect 1 fault out of more than 1 trillion arbitrary faults in the GEMM operations. As such, it achieves reliability that approaches that of DMR, at a very small fraction of its cost. Specifically, a single-threaded software realization of our proposal on an Intel i7-3632QM 2.2GHz processor (Ivy Bridge architecture with AVX support) incurs, on average, only 19% increase of execution time against an optimized, fault-intoleran...
Date of Conference: 08-10 July 2013
Date Added to IEEE Xplore: 19 September 2013
Electronic ISBN:978-1-4799-0664-2

ISSN Information:

Conference Location: Chania, Greece

References

References is not available for this document.