Abstract:
Multi-GPU systems have become a popular platform to meet the ever-growing application demands. However, employing multiple GPUs does not guarantee proportional performanc...Show MoreMetadata
Abstract:
Multi-GPU systems have become a popular platform to meet the ever-growing application demands. However, employing multiple GPUs does not guarantee proportional performance improvements. While prior works have extensively studied the optimizations to mitigate the non-uniform memory accesses (NUMA) overheads, the address translation process also plays an important role in shaping the overall execution performance. In this paper, we investigate the address translation process in multi-GPU systems under unified virtual memory (UVM). We specifically focus on the efficiency of page table walk and identify three major latency penalties: i) queuing for available page table walk threads, ii) memory accesses for page walk cache misses, and iii) handling page faults. Based on our observations, we propose Trans-FW, which short circuits the page table walk by leveraging substantial translation sharing and eager remote translation forwarding. Experimental results on 10 representative multi-GPU applications show that our proposed approach improves the overall performance by 53.8% on average.
Date of Conference: 25 February 2023 - 01 March 2023
Date Added to IEEE Xplore: 24 March 2023
ISBN Information: