skip to main content
10.1145/3663408.3663422acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

vSwitchLB: Stratified Load Balancing for vSwitch Efficiency in Data Centers

Published: 03 August 2024 Publication History

Abstract

The virtual switch (vSwitch) serves as a fundamental element in cloud network, critical for high-performance and strongly isolated inter-VM forwarding in local and external networks. Similar to other multicore systems, a vSwitch with multiple cores also faces the issue of core load imbalance. As a major cloud provider, we pinpoint four cases of core load imbalance within the vSwitch in our cloud, stemming from unequal traffic distribution across virtual queues and RSS buckets, as well as from traffic patterns like heavy hitters and micro-bursts. To tackle the different load imbalance cases, we present vSwitchLB, a vSwitch load balance framework. Specifically, we introduce a load imbalance detection module, accompanied by dedicated techniques designed to address each specific type of imbalance. Our preliminary evaluation shows that vSwitchLB can accurately classify different load imbalances encountered in the vSwitch on our cloud and then prevent any single core of vSwitch from being flooded and overwhelmed.

References

[1]
Saksham Agarwal, Rachit Agarwal, Behnam Montazeri, Masoud Moshref, Khaled Elmeleegy, Luigi Rizzo, Marc Asher de Kruijf, Gautam Kumar, Sylvia Ratnasamy, David Culler, and Amin Vahdat. 2022. Understanding host interconnect congestion. In ACM HotNets 22. 198–204.
[2]
Saksham Agarwal, Arvind Krishnamurthy, and Rachit Agarwal. 2023. Host Congestion Control. In ACM SIGCOMM 23. 275–287.
[3]
APACHE. 2024. APACHE KAFKA. https://kafka.apache.org/
[4]
T BARBETTE. 2018. DPDK extensions for OpenBox. https://github.com/tbarbette/fastclick/
[5]
Tom Barbette, Georgios P. Katsikas, Gerald Q. Maguire, and Dejan Kostić. 2019. RSS++: load and state-aware receive side scaling. In ACM CoNEXT 19. 318–333.
[6]
Fabrício B. Carvalho, Ronaldo A. Ferreira, Ítalo Cunha, Marcos A. M. Vieira, and Murali K. Ramanathan. 2022. Dyssect: Dynamic Scaling of Stateful Network Functions. In IEEE INFOCOM 22. 1529–1538.
[7]
Michael Dalton, David Schultz, Jacob Adriaens, Ahsan Arefin, Anshuman Gupta, Brian Fahs, Dima Rubinstein, Enrique Cauich Zermeno, Erik Rubow, James Alexander Docauer, 2018. Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization. In USENIX NSDI 18. 373–387.
[8]
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74–80.
[9]
Linux Foundation. 2018. Open vSwitch. https://www.openvswitch.org
[10]
Linux Foundation. 2019. Data Plane Development Kit. https://www.dpdk.org
[11]
Linux Foundation. 2019. PMD Automatic Load Balance. https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/
[12]
Linux Foundation. 2019. vHost Multiqueue. https://docs.openvswitch.org/en/latest/howto/dpdk/
[13]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, 2021. When cloud storage meets RDMA. In USENIX NSDI 21. 519–533.
[14]
Intel. 2023. Dynamic Load Balancer. https://www.intel.com/content/www/tw/zh/download/686372/intel-dynamic-load-balancer.html
[15]
Intel. 2023. The forth Intel Xeon Scalable Processor. https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html
[16]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for µsecond-scale Tail Latency. In USENIX NSDI 19. 345–360.
[17]
Georgios P Katsikas, Tom Barbette, Dejan Kostić, JR Gerald Q Maguire, and Rebecca Steinert. 2021. Metron: High-performance NFV service chaining even in the presence of blackboxes. ACM Transactions on Computer Systems (TOCS) 38, 1-2 (2021), 1–45.
[18]
Georgios P. Katsikas, Tom Barbette, Dejan Kostić, Rebecca Steinert, and Gerald Q. Maguire Jr.2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In USENIX NSDI 18. 171–186.
[19]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M Frans Kaashoek. 2000. The Click modular router. ACM Transactions on Computer Systems (TOCS) 18, 3 (2000), 263–297.
[20]
Guohan Lu, Chuanxiong Guo, Yulong Li, Zhiqiang Zhou, Tong Yuan, Haitao Wu, Yongqiang Xiong, Rui Gao, and Yongguang Zhang. 2011. ServerSwitch: A Programmable and High Performance Platform for Data Center Networks. In USENIX NSDI 11.
[21]
Jianyuan Lu, Tian Pan, Shan He, Mao Miao, Guangzhe Zhou, Yining Qi, Shize Zhang, Enge Song, Xiaoqing Sun, Huaiyi Zhao, Biao Lyu, and Shunmin Zhu. 2024. CloudSentry: Two-Stage Heavy Hitter Detection for Cloud-Scale Gateway Overload Protection. IEEE Transactions on Parallel and Distributed Systems 35, 4 (2024), 616–633.
[22]
Chengfei Lv, Chaoyue Niu, Renjie Gu, Xiaotang Jiang, Zhaode Wang, Bin Liu, Ziqi Wu, Qiulin Yao, Congyu Huang, Panos Huang, 2022. Walle: An End-to-End, General-Purpose, and Large-Scale Production System for Device-Cloud Collaborative Machine Learning. In USENIX OSDI 22. 249–265.
[23]
Mellanox. 2017. Mellanox ASAP2: Accelerated Switching and Packet Processing. https://www.mellanox.com/related-docs/products/SB_asap2.pdf
[24]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In USENIX NSDI 19. 361–378.
[25]
Tian Pan, Nianbing Yu, Chenhao Jia, Jianwen Pi, Liang Xu, Yisong Qiao, Zhiguo Li, Kun Liu, Jie Lu, Jianyuan Lu, 2021. Sailfish: Accelerating Cloud-Scale Multi-Tenant Multi-Service Gateways with Programmable Switches. In ACM SIGCOMM 21. 194–206.
[26]
Clayne B. Robison. 2017. How to Set Up Intel Ethernet Flow Director. https://software.intel.com/enus/ articles/setting-up-intel-ethernet-flow-director
[27]
Alexander Rucker, Muhammad Shahbaz, Tushar Swamy, and Kunle Olukotun. 2019. Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs. In ACM APNet 19. 71–77.
[28]
Enge Song, Nianbing Yu, Tian Pan, Qiang Fu, Liang Xu, Xionglie Wei, Yisong Qiao, Jianyuan Lu, Yijian Dong, Mingxu Xie, 2022. MMIMIC: SmartNIC-aided Flow Backpressure for CPU Overloading Protection in Multi-Tenant Clouds. In IEEE ICNP 22. 1–11.
[29]
Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, 2022. Unity: Accelerating DNN training through joint optimization of algebraic transformations and parallelization. In USENIX OSDI 22. 267–284.
[30]
Chengkun Wei, Xing Li, Ye Yang, Xiaochong Jiang, Tianyu Xu, Bowen Yang, Taotao Wu, Chao Xu, Yilong Lv, Haifeng Gao, 2023. Achelous: Enabling Programmability, Elasticity, and Reliability in Hyperscale Cloud Networks. In ACM SIGCOMM 23. 769–782.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking
August 2024
230 pages
ISBN:9798400717581
DOI:10.1145/3663408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Center
  2. Load-balancing
  3. vSwitch

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APNet 2024

Acceptance Rates

APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;
Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 64
    Total Downloads
  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)13
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media