Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems

Ernsting, Steffen; Kuchen, Herbert

doi:10.3233/978-1-61499-041-3-509

Abstract

CUDA made general-purpose computing for Graphics Processing Units (GPU) popular. But still GPU programming is error-prone and there are a lot of peculiarities to be considered for developing efficient GPU accelerated applications. Algorithmic skeletons encapsulate typical parallel programming patterns. They have emerged to be an efficient approach to simplifying the development of parallel and distributed applications. In this paper, we present an extension of our skeleton library Muesli in terms of GPU accelerated data parallel skeletons for multi-GPU systems and GPU clusters using CUDA. Besides the computation on the GPU, these skeletons also fully hide data transfer between different GPU devices as well as network transfer between different compute nodes. Experimental results demonstrate competitive performance results for some example applications.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies