As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
CUDA made general-purpose computing for Graphics Processing Units (GPU) popular. But still GPU programming is error-prone and there are a lot of peculiarities to be considered for developing efficient GPU accelerated applications. Algorithmic skeletons encapsulate typical parallel programming patterns. They have emerged to be an efficient approach to simplifying the development of parallel and distributed applications. In this paper, we present an extension of our skeleton library Muesli in terms of GPU accelerated data parallel skeletons for multi-GPU systems and GPU clusters using CUDA. Besides the computation on the GPU, these skeletons also fully hide data transfer between different GPU devices as well as network transfer between different compute nodes. Experimental results demonstrate competitive performance results for some example applications.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.