Abstract:
RDMA is becoming prevalent because of its low latency, high throughput and low CPU overhead. However, in current datacenters, RDMA remains a single path transport which i...Show MoreMetadata
Abstract:
RDMA is becoming prevalent because of its low latency, high throughput and low CPU overhead. However, in current datacenters, RDMA remains a single path transport which is prone to failures and falls short to utilize the rich parallel network paths. Unlike previous multi-path approaches, which mainly focus on TCP, this paper presents a multi-path transport for RDMA, i.e. MP-RDMA, which efficiently utilizes the rich network paths in datacenters. MP-RDMA employs three novel techniques to address the challenge of limited RDMA NICs on-chip memory size: 1) a multi-path ACK-clocking mechanism to distribute traffic in a congestion-aware manner without incurring per-path states; 2) an out-of-order aware path selection mechanism to control the level of out-of-order delivered packets, thus minimizes the meta data required to them; 3) a synchronise mechanism to ensure in-order memory update whenever needed. With all these techniques, MP-RDMA only adds 66B to each connection state compared to single-path RDMA. Our evaluation with an FPGA-based prototype demonstrates that compared with single-path RDMA, MP-RDMA can significantly improve the robustness under failures (2×~4× higher throughput under 0.5%~10% link loss ratio) and improve the overall network utilization by up to 47%.
Published in: IEEE/ACM Transactions on Networking ( Volume: 27, Issue: 6, December 2019)