Loading [MathJax]/extensions/MathMenu.js
Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead | IEEE Conference Publication | IEEE Xplore

Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead


Abstract:

Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swappi...Show More

Abstract:

Sorting is a well-investigated topic in Computer Science in general and by now many efficient sorting algorithms for CPUs and GPUs have been developed. There is no swapping, paging, etc. available on GPUs to provide more virtual memory than physically available, thus if one wants to sort sequences that exceed GPU memory using the GPU the problem of external sorting arises. In this contribution we present a novel merge-based external sorting algorithm for one or more CUDA-enabled GPUs. We reduce the performance impact of memory transfers to and from the GPU by using an approach similar to regular samplesort and by overlapping memory transfers with GPU computation. We achieve a good utilization of GPUs and load balancing among them by carefully choosing the samples and the amount of GPU memory used for computation. We demonstrate the performance of our algorithm by extended testing. Using two GTX280 the implementation outperforms the fastest CPU sorting algorithms known to the authors.
Date of Conference: 19-23 April 2010
Date Added to IEEE Xplore: 24 May 2010
ISBN Information:
Conference Location: Atlanta, GA, USA

Contact IEEE to Subscribe

References

References is not available for this document.