research-article

Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory

Authors:

Debashis Ganguly,

Ziyu Zhang,

Jun Yang,

Rami MelhemAuthors Info & Claims

ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture

Pages 224 - 235

https://doi.org/10.1145/3307650.3322224

Published: 22 June 2019 Publication History

Get Access

Abstract

Memory capacity in GPGPUs is a major challenge for data-intensive applications with their ever increasing memory requirement. To fit a workload into the limited GPU memory space, a programmer needs to manually divide the workload by tiling the working set and perform user-level data migration. To relieve the programmer from this burden, Unified Virtual Memory (UVM) was developed to support on-demand paging and migration, transparent to the user. It further takes care of the memory over-subscription issue by automatically performing page replacement in an oversubscribed GPU memory situation. However, we found that naïve handling of page faults can cause orders of magnitude slowdown in performance. Moreover, we observed that although prefetching of data from CPU to GPU can hide the page fault latency, the difference among various prefetching mechanisms can lead to drastically different performance results. To this end, we performed extensive experiments on GeForceGTX 1080ti GPUs with PCI-e 3.0 16x to discover that there exists an effective prefetch mechanism to enhance locality in GPU memory. However, as the GPU memory is filled to its capacity, such prefetching mechanism quickly proves to be counterproductive due to locality unaware eviction policy. This necessitates the design of new eviction policies that are aware of the hardware prefetcher semantics. We propose two new programmer-agnostic, locality-aware pre-eviction policies which leverage the mechanics of existing hardware prefetcher and thus incur no additional implementation and performance overhead. We demonstrate that combining the proposed tree-based pre-eviction policy with the hardware prefetcher provides an average of 93% and 18.5% performance speed-up compared to LRU based 4KB and 2MB page replacement strategies, respectively. We further examine the memory access pattern of GPU workloads under consideration to analyze the achieved performance speed-up.

References

[1]

Neha Agarwal, David Nellans, Mark Stephenson, Mike O'Connor, and Stephen W. Keckler. 2015. Page Placement Strategies for GPUs Within Heterogeneous Memory Systems. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 607--618.

Abstract

References

Cited By

Index Terms

Recommendations

Oversubscribing GPU Unified Virtual Memory: Implications and Suggestions

Long short term memory based hardware prefetcher: a case study

Deep learning based data prefetching in CPU-GPU unified virtual memory

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations