Abstract:
This paper presents a new approach to extracting improved performance-per-watt on large-scale hybrid graph applications with sparse data access patterns. The proposed tec...Show MoreMetadata
Abstract:
This paper presents a new approach to extracting improved performance-per-watt on large-scale hybrid graph applications with sparse data access patterns. The proposed technique takes advantage of demand paging, a technology recently introduced on CPU-GPU systems with heterogeneous memory. The strategy combines an analytical cost model, compiler transformations and a runtime system. The cost model, guided by runtime feedback, judiciously selects data structures for host placement which are migrated to the GPU during kernel execution via demand paging. We then introduce, two new code transformations, kernel blocking and compute colocation, to exploit page-level locality in host-resident data.We evaluate our strategy on four important algorithms in graph analytics: BFS, MST, SSSP and PageRank. Demand paging combined with kernel blocking causes significant reduction in PCIe traffic and yields an average speedup of 2.46, and up to a 5× performance improvement on BFS, over state-of-the-art methods. The performance boost does not incur a commensurate increase in GPU power draw, thereby leading to significant gains in energy efficiency. On average, 2.36 improvement in performance-per-watt is achieved across the four algorithms.
Date of Conference: 21-24 October 2019
Date Added to IEEE Xplore: 13 January 2020
ISBN Information: