research-article

QCMP: Load Balancing via In-Network Reinforcement Learning

Authors:
Changgang Zheng

University of Oxford

University of Oxford

0000-0003-1894-722X
View Profile

,
Benjamin Rienecker

University of Oxford

University of Oxford

0009-0008-2469-8710
View Profile

,
Noa Zilberman

University of Oxford

University of Oxford

0000-0002-3655-2873
View Profile

FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & AddressingSeptember 2023Pages 35–40https://doi.org/10.1145/3607504.3609291

Published:10 September 2023Publication History

FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & Addressing

Pages 35–40

ABSTRACT

Traffic load balancing is a long time networking challenge. The dynamism of traffic and the increasing number of different workloads that flow through the network exacerbate the problem. This work presents QCMP, a Reinforcement-Learning based load balancing solution. QCMP is implemented within the data plane, providing dynamic policy adjustment with quick response to changes in traffic. QCMP is implemented using P4 on a switch-ASIC and using BMv2 in a simulation environment. Our results show that QCMP requires negligible resources, runs at line rate, and adapts quickly to changes in traffic patterns.

References

Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, et al. CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. In ACM SIGCOMM, pages 503--514, 2014.Google Scholar
Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. Auto: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization. In ACM SIGCOMM, pages 191--205, 2018.Google Scholar
Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In ACM SIGCOMM, pages 350--361, 2011.Google ScholarDigital Library
Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement Learning: A Survey. JAIR, 4:237--285, 1996.Google ScholarDigital Library
Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. Hula: Scalable Load Balancing Using Programmable Data Planes. In ACM SOSR, pages 1--12, 2016.Google ScholarDigital Library
Jingling Liu, Jiawei Huang, Wanchun Jiang, and Jianxin Wang. Survey on Load Balancing Mechanism in Data Center. Journal of Software, 32(2):300--326, 2020.Google Scholar
Oliver Michel, Roberto Bifulco, Gabor Retvari, and Stefan Schmid. The Programmable Data Plane: Abstractions, Architectures, Algorithms, and Applications. ACM Computing Surveys (CSUR), 54(4):1--36, 2021.Google Scholar
Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving Datacenter Performance and Robustness with Multipath TCP. ACM SIGCOMM Computer Communication Review, 41(4):266--277, 2011.Google ScholarDigital Library
Gavin A Rummery and Mahesan Niranjan. On-Line Q-Learning Using Connectionist Systems, volume 37. Citeseer, 1994.Google Scholar
Kyle A Simpson and Dimitrios P Pezaros. Revisiting the Classics: Online RL in the Programmable Dataplane. In NOMS, IEEE/IFIP Network Operations and Management Symposium, pages 1--10. IEEE, 2022.Google ScholarDigital Library
Giuseppe Siracusano, Salvator Galea, Davide Sanvito, Mohammad Malekzadeh, et al. Re-architecting Traffic Analysis with Neural Network Interface Cards. In USENIX NSDI, pages 513--533, 2022.Google Scholar
Carl A Sunshine. Source Routing in Computer Networks. ACM SIGCOMM Computer Communication Review, 7(1):29--33, 1977.Google ScholarDigital Library
Dave Thaler and C Hopps. Multipath Issues in Unicast and Multicast Next-Hop Selection. Technical report, 2000.Google ScholarDigital Library
Christopher JCH Watkins and Peter Dayan. Q-Learning. Machine learning, 8:279--292, 1992.Google Scholar
Jiao Zhang, F Richard Yu, Shuo Wang, Tao Huang, Zengyi Liu, and Yunjie Liu. Load balancing in data center networks: A survey. IEEE Communications Surveys & Tutorials, 20(3):2324--2352, 2018.Google ScholarCross Ref
Changgang Zheng, Zhaoqi Xiong, Thanh T Bui, Siim Kaupmees, Riyad Bensoussane, Antoine Bernabeu, Shay Vargaftik, Yaniv Ben-Itzhak, and Noa Zilberman. IIsy: Practical In-Network Classification, 2022.Google Scholar
Changgang Zheng, Mingyuan Zang, Xinpeng Hong, Riyad Bensoussane, Shay Vargaftik, Yaniv Ben-Itzhak, and Noa Zilberman. Automating In-Network Machine Learning, 2022.Google Scholar
Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, et al. WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers. In ACM EuroSys, pages 1--14, 2014.Google Scholar

Index Terms

QCMP: Load Balancing via In-Network Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Networks
  1. Network services
    1. In-network processing
    2. Programmable networks

Recommendations

Network Load Balancing with In-network Reordering Support for RDMA
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

Remote Direct Memory Access (RDMA) is widely used in high-performance computing (HPC) and data center networks. In this paper, we first show that RDMA does not work well with existing load balancing algorithms because of its traffic flow characteristics ...
Read More
SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs
SIGCOMM '17: Proceedings of the Conference of the ACM Special Interest Group on Data Communication

In this paper, we show that up to hundreds of software load balancer (SLB) servers can be replaced by a single modern switching ASIC, potentially reducing the cost of load balancing by over two orders of magnitude. Today, large data centers typically ...
Read More
POSTER: Automated Load Balancer Selection Based on Application Characteristics
PPoPP '17

Many HPC applications require dynamic load balancing to achieve high performance and system utilization. Different applications have different characteristics and hence require different load balancing strategies. Invocation of a suboptimal load ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & Addressing
September 2023
44 pages
ISBN:9798400702761
DOI:10.1145/3607504

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 September 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Distributed; Reinforcement Learning
In-Network Computing
Load Balancing
Machine Learning
P4
Programmable Switches
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
FIRA '23 Paper Acceptance Rate6of9submissions,67%Overall Acceptance Rate6of9submissions,67%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 222
  Total Downloads
- Downloads (Last 12 months)222
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

QCMP: Load Balancing via In-Network Reinforcement Learning

FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & Addressing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Network Load Balancing with In-network Reordering Support for RDMA

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

POSTER: Automated Load Balancer Selection Based on Application Characteristics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

QCMP: Load Balancing via In-Network Reinforcement Learning

FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & Addressing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Network Load Balancing with In-network Reordering Support for RDMA

SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs

POSTER: Automated Load Balancer Selection Based on Application Characteristics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media