Processing math: 100%
Enhancing Load Balancing With In-Network Recirculation to Prevent Packet Reordering in Lossless Data Centers | IEEE Journals & Magazine | IEEE Xplore

Enhancing Load Balancing With In-Network Recirculation to Prevent Packet Reordering in Lossless Data Centers


Abstract:

Many existing load balancing mechanisms work effectively in lossy datacenter networks (DCNs), but they suffer from serious packet reordering in lossless Ethernet DCNs dep...Show More

Abstract:

Many existing load balancing mechanisms work effectively in lossy datacenter networks (DCNs), but they suffer from serious packet reordering in lossless Ethernet DCNs deployed with the hop-by-hop Priority-based Flow Control (PFC). The key reason is that the prior solutions are not able to perceive PFC triggering correctly and in a timely manner when making load balancing decisions. Once the forwarding path pauses transmission due to PFC triggering, the packets allocated on it are blocked, inevitably leading to out-of-order packets and retransmission. In this paper, we present an Reordering-robust Load Balancing (RLB) scheme with PFC prediction in lossless DCNs. At its heart, RLB leverages the derivative of ingress queue length to predict PFC triggering and proactively notifies the upstream switches to choose an appropriate rerouting path or perform packet recirculation to avoid reordering. Furthermore, under switch failure scenarios, RLB adjusts the recirculation threshold adaptively to mitigate the risk of packets over-recirculation. We have implemented RLB in the hardware programmable switch. As a building block for existing load balancing mechanisms, we have integrated RLB into Presto, LetFlow, Hermes and DRILL. The evaluation results show that the RLB-enhanced solutions deliver significant performance by avoiding packet reordering. For example, it reduces the 99^{th} percentile flow completion time (FCT) by up to 72%, 67%, 58% and 54% over DRILL, Presto, LetFlow and Hermes, respectively.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 5, October 2024)
Page(s): 4114 - 4127
Date of Publication: 27 May 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.