Abstract:
Generalized matrix multiplication (GEMM) involves computing the matrix product of any two matrices with appropriate dimensions. Specifically, GEMM doesn't enforce rules o...Show MoreMetadata
Abstract:
Generalized matrix multiplication (GEMM) involves computing the matrix product of any two matrices with appropriate dimensions. Specifically, GEMM doesn't enforce rules on the structure of the entries of the input matrices, such as requiring them to be diagonal, symmetric, or any other special case. Similarly, GEMM doesn't require the matrices to be of a certain density or sparsity. GEMM is utilized in many practical applications including deep learning with convolutional networks, computer vision, and large-scale signal processing. The advantage of a generalized matrix multiplication algorithm is the ability to process big matrix datasets through efficient memory access techniques with a lower requirement for temporary storage than other methods. We designed a parallel divide and conquer general matrix multiplication (PDCGMM) algorithm that performs GEMM comparably for both sparse and dense matrices. PDCGMM also takes advantage of the parallel processing ability of GPUs by using the Computer Unified Device Architecture (CUDA) and efficient usage of GPU memory. We experimented with PDCGMM on five matrices with different sizes and densities to evaluate the algorithm's performance. PDCGMM demonstrated 5–6 times speedup over NumPy's built-in GEMM algorithm for large matrices and 70–90 times speedup for matrices that could be stored entirely on GPU memory.
Date of Conference: 08-11 March 2023
Date Added to IEEE Xplore: 18 April 2023
ISBN Information: