Abstract:
According to the latest TOP500 list, InfiniBand (IB) is the most widely used network architecture in the top 10 supercomputers. IB relies on Credit-based Flow Control (CB...Show MoreMetadata
Abstract:
According to the latest TOP500 list, InfiniBand (IB) is the most widely used network architecture in the top 10 supercomputers. IB relies on Credit-based Flow Control (CBFC) to provide a lossless network and InfiniBand congestion control (IB CC) to relieve congestion, however, this can lead to the problem of victim flow since messages are mixed in the same queue and long-lived congestion spreading due to slow convergence. To deal with these problems, in this paper, we propose FlowStar, a fast convergence per-flow state accurate congestion control for InfiniBand. FlowStar includes two core mechanisms: 1) optimized per-flow CBFC mechanism provides flow state control to detect real congestion; and 2) rate adjustment rules make up for the mismatch between the original IB CC rate regulation and the per-hop CBFC to alleviate congestion spreading. FlowStar implements a per-flow congestion state on switches and can obtain in-flight packet information without additional parameter settings to ensure a lossless network. Evaluations show that FlowStar improves average and tail message complete time under different workloads.
Published in: IEEE/ACM Transactions on Networking ( Volume: 32, Issue: 3, June 2024)