ABSTRACT
Layer 4 (L4) load balancing is crucial in cloud computing and elastic microservices. Existing L4 load balancer designs can be split into two main categories: centralized designs using a hardware or software middlebox, and decentralized designs in which every node can play the role of the load balancer. Centralized designs offer better scheduling policies and easier worker node management, but suffer from I/O and CPU limitations. Decentralized designs scale better, but are harder to manage. We introduce HEELS, a novel load balancing scheme designed for internal cloud workloads and microservices, achieving the best of both worlds. HEELS uses the load balancer only during the connection establishment and allows clients and servers to communicate directly after that. Supporting general L4 load balancers and requiring no kernel changes, HEELS is readily deployable on the public cloud. We implement HEELS as a set of eBPF programs split across the client and server. Our evaluation shows that HEELS introduces minimal overheads, works with off-the-shelf load balancers (e.g., Katran by Meta), and significantly reduces the costs of cloud load balancers.
- João Taveira Araújo, Lorenzo Saino, Lennert Buytenhek, and Raul Landa. 2018. Balancing on the Edge: Transport Affinity without Network State.. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI). 111--124.Google Scholar
- AWS. 2023. AWS Elastic Load Balancing. (2023). https://aws.amazon.com/elasticloadbalancing/ [Accessed: (06/2023)].Google Scholar
- Tom Barbette, Chen Tang, Haoran Yao, Dejan Kostic, Gerald Q. Maguire Jr., Panagiotis Papadimitratos, and Marco Chiesa. 2020. A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency.. In Proceedings of the 17th Symposium on Networked Systems Design and Implementation (NSDI). 667--683.Google Scholar
- Cloudflare. 2020. Unimog - Cloudflare's edge load balancer. (2020). https://blog.cloudflare.com/unimog-cloudflares-edge-load-balancer/ [Accessed: (06/2023)].Google Scholar
- Docker. 2023. Docker Swarm. (2023). https://docs.docker.com/engine/swarm/ [Accessed: (06/2023)].Google Scholar
- Daniel E. Eisenbud, Cheng Yi, Carlo Contavalli, Cody Smith, Roman Kononov, Eric Mann-Hielscher, Ardas Cilingiroglu, Bin Cheyney, Wentao Shang, and Jinnah Dylan Hosein. 2016. Maglev: A Fast and Reliable Software Network Load Balancer.. In Proceedings of the 13th Symposium on Networked Systems Design and Implementation (NSDI). 523--535.Google Scholar
- Rohan Gandhi, Hongqiang Harry Liu, Y. Charlie Hu, Guohan Lu, Jitendra Padhye, Lihua Yuan, and Ming Zhang. 2014. Duet: cloud scale load balancing with hardware and software.. In Proceedings of the ACM SIGCOMM 2014 Conference. 27--38.Google ScholarDigital Library
- Yoann Ghigoff, Julien Sopena, Kahina Lazri, Antoine Blin, and Gilles Muller. 2021. BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing.. In Proceedings of the 18th Symposium on Networked Systems Design and Implementation (NSDI). 487--501.Google Scholar
- Github. 2016. Github Load Balancer. (2016). https://github.blog/2016-09-22-introducing-glb/ [Accessed: (06/2023)].Google Scholar
- Yutaro Hayakawa, Lars Eggert, Michio Honda, and Douglas Santry. 2017. Prism: a proxy architecture for datacenter networks.. In Proceedings of the 2017 ACM Symposium on Cloud Computing (SOCC). 181--188.Google ScholarDigital Library
- IETF. 2021. The QUIC Transport Protocol - IETF. (2021). https://datatracker.ietf.org/doc/html/draft-ietf-quic-transport-34 [Accessed: (06/2023)].Google Scholar
- Rick Jones. 2005. NetPerf. (2005). https://fossies.org/linux/netperf/doc/netperf.pdf [Accessed: (06/2023)].Google Scholar
- Kostis Kaffes, Jack Tigar Humphries, David Mazières, and Christos Kozyrakis. 2021. Syrup: User-Defined Scheduling Across the Stack.. In Proceedings of the 28th ACM Symposium on Operating Systems Principles (SOSP). 605--620.Google ScholarDigital Library
- Marios Kogias, Rishabh Iyer, and Edouard Bugnion. 2020. Bypassing the load balancer without regrets.. In Proceedings of the 2020 ACM Symposium on Cloud Computing (SOCC). 193--207.Google ScholarDigital Library
- Kubernetes. 2018. IPVS-based Kubernetes Load Balancing. (2018). https://kubernetes.io/blog/2018/07/09/ipvs-based-in-cluster-load-balancing-deep-dive [Accessed: (06/2023)].Google Scholar
- Kubernetes. 2023. Kube Proxy. (2023). https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/ [Accessed: (06/2023)].Google Scholar
- Kubernetes. 2023. Kubernetes Container Orchestrator. (2023). https://kubernetes.io/ [Accessed: (06/2023)].Google Scholar
- Linux. 2017. BPF_PROG_TYPE_SOCK_OPS. (2017). https://lwn.net/Articles/727189/ [Accessed: (06/2023)].Google Scholar
- Linux. 2023. BPF_MAP_TYPE_SK_STORAGE. (2023). https://docs.kernel.org/bpf/map_sk_storage.html [Accessed: (06/2023)].Google Scholar
- Linux. 2023. IPVS Virtual Server. (2023). http://www.linuxvirtualserver.org/software/ipvs.html [Accessed: (06/2023)].Google Scholar
- Linux. 2023. Linux kernel driver for Elastic Network Adapter (ENA) family. (2023). https://www.kernel.org/doc/html/latest/networking/device_drivers/ethernet/amazon/ena.html [Accessed: (06/2023)].Google Scholar
- Linux. 2023. Linux Traffic Control. (2023). https://man7.org/linux/man-pages/man8/tc.8.html [Accessed: (06/2023)].Google Scholar
- Zaoxing Liu, Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman, Roy Friedman, and Vyas Sekar. 2019. Nitrosketch: robust and general sketch-based monitoring in software switches.. In Proceedings of the ACM SIGCOMM 2019 Conference. 334--350.Google ScholarDigital Library
- Meta. 2023. Katran. (2023). https://github.com/facebookincubator/katran [Accessed: (06/2023)].Google Scholar
- Sebastiano Miano, Xiaoqi Chen, Ran Ben Basat, and Gianni Antichi. 2023. Fast In-kernel Traffic Sketching in eBPF. Comput. Commun. Rev. 53, 1 (2023), 3--13.Google ScholarDigital Library
- Sebastiano Miano, Fulvio Risso, Mauricio Vásquez Bernal, Matteo Bertrone, and Yunsong Lu. 2021. A Framework for eBPF-Based Network Functions in an Era of Microservices. IEEE Trans. Netw. Serv. Manag. 18, 1 (2021), 133--151.Google ScholarDigital Library
- Rui Miao, Hongyi Zeng, Changhoon Kim, Jeongkeun Lee, and Minlan Yu. 2017. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs.. In Proceedings of the ACM SIGCOMM 2017 Conference. 15--28.Google ScholarDigital Library
- NGINX. 2016. NGINX DSR: IP Transparency and Direct Server Return with NGINX and NGINX Plus as Transparent Proxy. (2016). https://www.nginx.com/blog/ip-transparency-direct-server-return-nginx-plus-transparent-proxy/ [Accessed: (06/2023)].Google Scholar
- NGINX. 2023. NGINX Reverse Proxy. (2023). https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy [Accessed: (06/2023)].Google Scholar
- Vladimir Andrei Olteanu, Alexandru Agache, Andrei Voinescu, and Costin Raiciu. 2018. Stateless Datacenter Load-balancing with Beamer.. In Proceedings of the 15th Symposium on Networked Systems Design and Implementation (NSDI). 125--139.Google Scholar
- Parveen Patel, Deepak Bansal, Lihua Yuan, Ashwin Murthy, Albert G. Greenberg, David A. Maltz, Randy Kern, Hemant Kumar, Marios Zikos, Hongyu Wu, Changhoon Kim, and Naveen Karri. 2013. Ananta: cloud scale load balancing.. In Proceedings of the ACM SIGCOMM 2013 Conference. 207--218.Google ScholarDigital Library
- Shixiong Qi, Leslie Monis, Ziteng Zeng, Ian-Chin Wang, and K. K. Ramakrishnan. 2022. SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing.. In Proceedings of the ACM SIGCOMM 2022 Conference. 780--794.Google Scholar
- Gil Tene. 2023. wrk2: a HTTP benchmarking tool. (2023). https://github.com/giltene/wrk2/ [Accessed: (06/2023)].Google Scholar
Index Terms
- HEELS: A Host-Enabled eBPF-Based Load Balancing Scheme
Recommendations
Load balancing in cloud computing: A big picture
AbstractScheduling or the allocation of user requests (tasks) in the cloud environment is an NP-hard optimization problem. According to the cloud infrastructure and the user requests, the cloud system is assigned with some load (that may be ...
Comments