Abstract:
A great interest has been given to the Nonnegative Matrix Factorization (NMF) technique due to its ability of extracting highly-interpretable parts from data sets. Gene e...Show MoreMetadata
Abstract:
A great interest has been given to the Nonnegative Matrix Factorization (NMF) technique due to its ability of extracting highly-interpretable parts from data sets. Gene expression analysis is one of the most popular applications of NMF in Bioinformatics. Nonetheless, its usage is hindered by the computational complexity when processing large data sets. In this paper, we present two parallel implementations of NMF. The first version uses CUDA on a Graphics Processing Unit (GPU). Large input matrices are iteratively blockwise transferred and processed. The second implementation distributes data among multiple GPUs synchronized through MPI (Message Passing Interface). When analyzing large data sets with two and four GPUs, it performs respectively, 2.3 and 4.13 times faster than the single-GPU version. This represents about 120 times faster than a conventional CPU. These super linear speedups are achieved when data portions assigned to each GPU are small enough to be transferred only once.
Date of Conference: 22-24 November 2011
Date Added to IEEE Xplore: 02 January 2012
ISBN Information: