Elsevier

Computer Networks

Volume 157, 5 July 2019, Pages 133-145
Computer Networks

Adaptive load balancing based on accurate congestion feedback for asymmetric topologies

https://doi.org/10.1016/j.comnet.2019.04.006Get rights and content

Abstract

Datacenter load balancing schemes exist to facilitate parallel data transmission with multiple paths under various uncertainties such as traffic dynamics and topology asymmetries. Taking deployment challenges into account, several optimized schemes (e.g. CLOVE, Hermes) to ECMP balance load at end hosts. However, inaccurate congestion feedback exists in these solutions. They either detect congestion through Explicit Congestion Notification (ECN) and coarse-grained Round-Trip Time (RTT) measurements or are congestion-oblivious. These congestion feedbacks are not sufficient enough to indicate the accurate congestion status under asymmetry. And when rerouting events occur, outdated ACKs carrying congestion feedback of other paths can improperly influence the current sending rate. After our observations and analyses, these inaccurate congestion feedback can degrade performance.

Therefore, we explore how to address above problems while ensuring good adaptation to existing switch hardware and network protocol stack. We propose ALB, an adaptive load balancing mechanism based on accurate congestion feedback running at end hosts, which is resilient to asymmetry. ALB leverages a latency-based congestion detection to precisely reroute new flowlets to the paths with lighter load, and an ACK correction method to avoid inaccurate flow rate adjustment. In large-scale simulations, ALB achieves up to 13% and 48% better average flow completion time (FCT) than CONGA and CLOVE-ECN under asymmetry, respectively. And compared with other schemes ALB improves the average and the 99th percentile FCTs for small flows under high bursty traffic by 43–174% and 75–129%. Under the situation of dynamic network changes, ALB also provides competitive overall performance and maintains stable performance for small flows.

Introduction

Datacenter networks typically adopt multi-rooted topologies, such as fat tree and leaf spine, to provide high bisection bandwidth. The multipath existing in these topologies provides several alternative routing paths between any two end hosts which are connected by different switches. Balancing load in multiple paths to fully utilize the network resource can improve throughput and reduce latency for datacenter applications. But various uncertainties, such as traffic dynamics and topology asymmetries, pose great challenges for designing efficient load balancing schemes. Production datacenters present a network environment with dynamic traffics [1], where applications that are sensitive to bandwidth (e.g. MapReduce) and sensitive to flow completion time (e.g. Memcached) exist. And asymmetry is common in datacenter networks [2] because of adding racks, heterogenous network devices, cutting links and switch malfunctions [3], [4]. Efficient load balancing mechanisms usually adapt to above uncertainties, which should accurately detect path conditions and distribute traffic among multipaths based on path conditions.

However, Equal Cost Multiple Path (ECMP) forwarding [5], as the standard strategy used today for load balancing in datacenter networks, performs poorly. It randomly assigns flows to different paths permanently according to a hash function using certain tuples from the packet header. Because it accounts for neither path conditions nor flow size, it can waste over 50% of the bisection bandwidth [6].

Therefore, prior solutions (e.g. CONGA [7], CLOVE-ECN [8], FlowBender [9], Hermes [2]) have made a great deal of effort to improve performance, but they still have some drawbacks. Some distributed load balancing schemes (e.g. CONGA, HULA [10], LetFlow [11]) residing in custom switches are hard to deploy in general datacenter networks, although they achieve significant improvement for throughput and latency. Centralized solutions (e.g. Hedera [6]) globally schedule large flows by collecting network information in a controller. But they have long scheduling intervals, which are not adaptive to the traffic volatility of datacenter networks and harmful for small flows.

The last category of solutions (e.g. CLOVE-ECN, Hermes) are deployed at network edges (e.g. hypervisor) or end hosts to keep practical. Some of them are designed to be congestion-aware. However, they depend too much on the rough congestion feedback (e.g. ECN and coarse-grained RTT measurements) to sense congestion. CLOVE-ECN learns congestion along network paths from ECN signals and uses a weighted round-robin (WRR) algorithm to dynamically route flowlets [12] on multiple paths. Hermes also exploits ECN signals and coarse-grained RTT measurements to decide the flow path at the host side. RTT measurements lump latencies in both directions along the network path. In order to use RTTs to capture congestion of the forward path, prior mechanisms (e.g. Hermes, TIMELY [13]) classify pure ACK packets in the reverse path into the higher priority queue. Inaccurate ECN signals and coarse-grained RTT measurements can degrade the performance gains in asymmetric topologies though they schedule flows using excellent algorithms. The ECN-based congestion detection cannot accurately characterize the degree of congestion among multiple paths in an asymmetric network due to its inherent oversimplified feedback. The coarse-grained RTT measurement introduces end host network stack delay, which can be believable if and only if a small enough RTT is discovered. Thus it cannot accurately represent the degree of path congestion. Furthermore, ECN is a passive and delayed mechanism for informing congestion level in multiple paths. Thus it can hardly help achieve timely load balancing.

Actually the end-to-end latency effectively indicates whether the path has been congested. Fortunately with the rapid growth of cloud computing and network functions virtualization (NFV), the advances in widely used NIC hardware and efficient packet IO frameworks (e.g. DPDK [14]) have made the measurement of end-to-end latency possible with microsecond accuracy. Latency-based implicit feedback is accurate enough to reveal path congestion [15]. DPDK now supports all major CPU architectures and NICs from multiple vendors (e.g. Intel, Emulex, Mellanox and Cisco). A tuned DPDK solution (e.g. TRex [16]) only introduces 5–10µs overhead [17]. With the help of DPDK, the end-to-end latency can be measured with sufficient precision to sense path conditions. Several latency-based congestion control protocols for datacenter networks have emerged (e.g. TIMELY, DX [15]). But latency-based implicit feedback has hardly been applied to load balancing schemes.

Moreover, current load balancing solutions also create new inaccurate congestion feedback in transport protocols. The end hosts in present datacenters commonly adopt ECN-based transport protocols (e.g. DCTCP [18]). Congestion control algorithms of transport protocols usually adjust the rate (window) of a flow based on the congestion state of the current path. When rerouting events happen, outdated ACKs with no ECE mark of the other path may improperly increase the sending rate (window), while the ones with an ECE mark will mistakenly decrease the sending rate. This problem hinders the utilization of link bandwidth especially under asymmetric topologies, because network asymmetry creates different network conditions more easily among different routing paths.

According to above observation, we find inaccurate congestion feedback causes inaccurate detection to path conditions and incorrect flow rate adjustment. This problem is bound to affect the performance of load balancing (Sections 2.2 and 2.3). Therefore, we ask the following question: can we design a congestion-aware load balancing scheme that can achieve accurate congestion feedback and keep practical? Finally we present ALB to answer this question, which is an adaptive load balancing solution implemented at end hosts. ALB employs accurate latency-based measurement to detect network path congestion. The latency-based congestion detection enables ALB to accurately reroute flows. And an ACK correction method is used to avoid blindly adjusting the flow rate at source hosts.

We make following contributions in this paper:

  • We analyze that inaccurate congestion feedback can degrade performance under asymmetry in load balancing.

  • We present ALB, an adaptive load balancing mechanism based on accurate congestion feedback running at end hosts, which is resilient to asymmetry and readily-deloyable with commodity switches in large-scale datacenters.

  • In large-scale simulations we show that ALB achieves up to 13% and 48% better flow completion time than CONGA and CLOVE-ECN under asymmetry, respectively. Under the impact of dynamic network changes, ALB improves the overall average FCT by 5–42% compared to CLOVE-ECN. And ALB always keeps the best and stable performance for small flows under high bursty traffic. Compared with Hermes, ALB requires no complicated parameter settings and provides competitive performance.

Some preliminary results of this paper were published in the Proceedings of the IEEE/ACM International Symposium on Quality of Service (IWQoS, 2018) [19]. In this paper, we describe our motivation with more detailed theoretical and empirical analyses, improve the latency-based congestion detection mechanism (Section 3.2) and extend the evaluations for dynamic datacenter network changes (Section 4.2.2).

The rest of this paper is organized as follows. In next section, we introduce the background and motivation of designing ALB. Then we detail ALB in Section 3. And we evaluate ALB and show the superiority of ALB compared to other solutions in Section 4. Finally we briefly introduce the related work in Section 5 and summarize our work in Section 6.

Section snippets

Background and motivation

In this section, we describe network asymmetries and traffic dynamics pose challenges to load balancing and inaccurate congestion feedback exacerbates the performance loss. These problems motivate us to design ALB.

Overview

We present ALB’s framework in Fig. 5. ALB contains two modules, which are MDCTCP and ALB core. We design MDCTCP by slightly modifying DCTCP. MDCTCP is an ECN-based network protocol. And other three functions, namely source routing, latency-based congestion detection and accurate flowlet switching, work in the ALB core. The ALB core is implemented in software in hypervisor vSwitch (e.g. Open vSwitch), which is common for current multi-tenant datacenters to manager numerous virtual machines.

The

Evaluation

We evaluate ALB via the discrete-event network simulator NS3 [22]. Our evaluation seeks to answer the following questions:

How does each design component contributes to performance? (Section 4.1) ALB implements a novel latency-based congestion detection and an ACK correction method at end hosts. We evaluate the benefits brought by these two methods separately. Results show that both of them contribute to around 10% overall performance improvements under heavy loads.

How does ALB perform under

Related work

We briefly discuss related work that has informed and inspired our design.

Hedera [6], MicroTE [28] and FastPass [29] use a centralized scheduler to monitor global network state and schedules flows evenly in multiple paths. They cannot achieve timely reaction to latency-sensitive application requests and have difficulties handling traffic volatility.

Presto [24], DRB [23] and Flowbender [9] are per-flowcell/packet/flow based, congestion-oblivious load balancing solutions. They cannot effectively

Conclusion

We propose ALB, an adaptive load balancing mechanism based on accurate congestion feedback running at end hosts with commodity switches, which is resilient to asymmetry. ALB leverages the latency-based congestion detection to precisely route flowlets to lighter load paths, and an ACK correction method to avoid inaccurate flow rate adjustment. We evaluate ALB through large-scale simulations. Our results show that compared to schemes which require custom switch hardware for implementation, ALB

Acknowledgments

This work is supported in part by NSFC No. 61772216, National Defense Preliminary Research Project (31511010202), the National High Technology Research and Development Program (863 Program) of China under Grant no. 2013AA013203; Hubei Province Technical Innovation Special Project (2017AAA129), Wuhan Application Basic Research Project (2017010201010103), Project of Shenzhen Technology Scheme (JCYJ20170307172248636), Fundamental Research Funds for the Central Universities. This work is also

Qingyu Shi He received the BE degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2014. He is currently a Ph.D. student majoring in Computer Architecture in Wuhan National Laboratory for Optoelectronics (WNLO). His current research interests include software-defined networking and load balancing for datacenter networks. He has a publication in international conference: IWQoS.

References (29)

  • T. Benson et al.

    Network traffic characteristics of data centers in the wild

    Proceeding of the ACM IMC

    (2010)
  • H. Zhang et al.

    Resilient datacenter load balancing in the wild

    Proceeding of the ACM SIGCOMM

    (2017)
  • P. Gill et al.

    Understanding network failures in data centers: measurement, analysis, and implications

    Proceeding of the ACM SIGCOMM

    (2011)
  • C. Guo, L. Yuan, D. Xiang, Y. Dang, R. Huang, D. Maltz, Z. Liu, V. Wang, B. Pang, H. Chen, Z.-W. Lin, V. Kurien,...
  • C. Hopps

    Analysis of an equal-cost multi-path algorithm

    RFC 2992

    (2000)
  • M. Al-Fares et al.

    Hedera: dynamic flow scheduling for data center networks

    Proceeding of the USENIX NSDI

    (2010)
  • M. Alizadeh et al.

    CONGA: distributed congestion-aware load balancing for datacenters

    Proceeding of the ACM SIGCOMM

    (2014)
  • N. Katta et al.

    Clove: congestion-aware load balancing at the virtual edge

    Proceeding of the ACM CoNEXT

    (2017)
  • A. Kabbani et al.

    Flowbender: flow-level adaptive routing for improved latency and throughput in datacenter networks

    Proceeding of the ACM CoNEXT

    (2014)
  • N. Katta et al.

    HULA: scalable load balancing using programmable data planes

    Proceeding of the ACM SOSR

    (2016)
  • E. Vanini et al.

    Let it flow: resilient asymmetric load balancing with flowlet switching

    Proceeding of the USENIX NSDI

    (2017)
  • S. Kandula et al.

    Dynamic load balancing without packet reordering

    ACM SIGCOMM Comput. Commun. Rev.

    (2007)
  • R. Mittal et al.

    TIMELY: RTT-based congestion control for the datacenter

    Proceeding of the ACM SIGCOMM

    (2015)
  • Intel dpdk. data plane development kit, (Accessed 9 October 2010. [Online]. Available:...
  • Cited by (0)

    Qingyu Shi He received the BE degree in computer science and technology from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2014. He is currently a Ph.D. student majoring in Computer Architecture in Wuhan National Laboratory for Optoelectronics (WNLO). His current research interests include software-defined networking and load balancing for datacenter networks. He has a publication in international conference: IWQoS.

    Fang Wang She received her BE degree and Master degree in computer science in 1994, 1997, and Ph.D. degree in computer architecture in 2001 from Huazhong University of Science and Technology (HUST), China. She is a professor of computer science and engineering at HUST. Her interests include distribute file systems, parallel I/O storage systems and graph processing systems. She has more than 50 publications in major journals and international conferences, including FGCS, ACM TACO, SCIENCE CHINA Information Sciences, Chinese Journal of Computers and HiPC, ICDCS, HPDC, ICPP.

    Dan Feng She received the BE, ME, and Ph.D. degrees in Computer Science and Technology in 1991, 1994, and 1997, respectively, from Huazhong University of Science and Technology (HUST), China. She is a professor and vice dean of the School of Computer Science and Technology, HUST. Her research interests include computer architecture, massive storage systems, and parallel file systems. She has more than 100 publications in major journals and international conferences, including IEEE-TC, IEEE-TPDS, ACM-TOS, JCST, FAST, USENIX ATC, ICDCS, HPDC, SC, ICS, IPDPS, and ICPP. She serves on the program committees of multiple international conferences, including SC 2011, 2013 and MSST 2012. She is a member of IEEE and a member of ACM.

    Weibin Xie He received the BE degree in Energy and Power Engineering from the China University of Mining and Technology (CUMT), Xuzhou, China, in 2011. He is currently a Ph.D. student majoring in Computer Architecture in HUST. His current research interests include Computer networks and protocols and distributed storage systems. He has several publications in major journals and international conferences, including CN and IWQoS.

    View full text