I. Introduction
Data centers have evolved rapidly over the last few years, providing a wide variety of cloud services [5], [50] using TCP as the dominant transport layer protocol. However, the TCP incast problem causes drastic performance degradation when multiple senders synchronously send data to one receiver (i.e., many-to-one communication) with high-bandwidth and low-latency links [12], [62]. As the number of senders increases, bottleneck switches can quickly become overfilled. Inevitable packet drops would impose TCP retransmission timeout (RTO) for hundreds of milliseconds, resulting in goodput (the application-level throughput [39]) reduction of up to 90% [55], which affects the performance of applications.