Abstract:
In this paper, we report on the development of an efficient GPU implementation of the Strassen-Winograd matrix multiplication algorithm for matrices of arbitrary sizes. W...Show MoreMetadata
Abstract:
In this paper, we report on the development of an efficient GPU implementation of the Strassen-Winograd matrix multiplication algorithm for matrices of arbitrary sizes. We utilize multi-kernel streaming to exploit concurrency across sub-matrix operations in addition to intra-operation parallelism. We evaluate the performance of the implementation in comparison with CUBLAS-5.0 on Fermi and Kepler GPUs. The experimental results demonstrate the usefulness of Strassen's algorithm for practically relevant matrix sizes on GPUs, with up to 1.27X speedup for single-precision and 1.42X speedup for double-precision floating point computation.
Date of Conference: 18-21 December 2013
Date Added to IEEE Xplore: 17 April 2014
Electronic ISBN:978-1-4799-0730-4
Print ISSN: 1094-7256