skip to main content
10.1145/3607504.3609291acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

QCMP: Load Balancing via In-Network Reinforcement Learning

Published:10 September 2023Publication History

ABSTRACT

Traffic load balancing is a long time networking challenge. The dynamism of traffic and the increasing number of different workloads that flow through the network exacerbate the problem. This work presents QCMP, a Reinforcement-Learning based load balancing solution. QCMP is implemented within the data plane, providing dynamic policy adjustment with quick response to changes in traffic. QCMP is implemented using P4 on a switch-ASIC and using BMv2 in a simulation environment. Our results show that QCMP requires negligible resources, runs at line rate, and adapts quickly to changes in traffic patterns.

References

  1. Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, et al. CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. In ACM SIGCOMM, pages 503--514, 2014.Google ScholarGoogle Scholar
  2. Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. Auto: Scaling Deep Reinforcement Learning for Datacenter-Scale Automatic Traffic Optimization. In ACM SIGCOMM, pages 191--205, 2018.Google ScholarGoogle Scholar
  3. Phillipa Gill, Navendu Jain, and Nachiappan Nagappan. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications. In ACM SIGCOMM, pages 350--361, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement Learning: A Survey. JAIR, 4:237--285, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. Hula: Scalable Load Balancing Using Programmable Data Planes. In ACM SOSR, pages 1--12, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jingling Liu, Jiawei Huang, Wanchun Jiang, and Jianxin Wang. Survey on Load Balancing Mechanism in Data Center. Journal of Software, 32(2):300--326, 2020.Google ScholarGoogle Scholar
  7. Oliver Michel, Roberto Bifulco, Gabor Retvari, and Stefan Schmid. The Programmable Data Plane: Abstractions, Architectures, Algorithms, and Applications. ACM Computing Surveys (CSUR), 54(4):1--36, 2021.Google ScholarGoogle Scholar
  8. Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving Datacenter Performance and Robustness with Multipath TCP. ACM SIGCOMM Computer Communication Review, 41(4):266--277, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gavin A Rummery and Mahesan Niranjan. On-Line Q-Learning Using Connectionist Systems, volume 37. Citeseer, 1994.Google ScholarGoogle Scholar
  10. Kyle A Simpson and Dimitrios P Pezaros. Revisiting the Classics: Online RL in the Programmable Dataplane. In NOMS, IEEE/IFIP Network Operations and Management Symposium, pages 1--10. IEEE, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Giuseppe Siracusano, Salvator Galea, Davide Sanvito, Mohammad Malekzadeh, et al. Re-architecting Traffic Analysis with Neural Network Interface Cards. In USENIX NSDI, pages 513--533, 2022.Google ScholarGoogle Scholar
  12. Carl A Sunshine. Source Routing in Computer Networks. ACM SIGCOMM Computer Communication Review, 7(1):29--33, 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dave Thaler and C Hopps. Multipath Issues in Unicast and Multicast Next-Hop Selection. Technical report, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christopher JCH Watkins and Peter Dayan. Q-Learning. Machine learning, 8:279--292, 1992.Google ScholarGoogle Scholar
  15. Jiao Zhang, F Richard Yu, Shuo Wang, Tao Huang, Zengyi Liu, and Yunjie Liu. Load balancing in data center networks: A survey. IEEE Communications Surveys & Tutorials, 20(3):2324--2352, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  16. Changgang Zheng, Zhaoqi Xiong, Thanh T Bui, Siim Kaupmees, Riyad Bensoussane, Antoine Bernabeu, Shay Vargaftik, Yaniv Ben-Itzhak, and Noa Zilberman. IIsy: Practical In-Network Classification, 2022.Google ScholarGoogle Scholar
  17. Changgang Zheng, Mingyuan Zang, Xinpeng Hong, Riyad Bensoussane, Shay Vargaftik, Yaniv Ben-Itzhak, and Noa Zilberman. Automating In-Network Machine Learning, 2022.Google ScholarGoogle Scholar
  18. Junlan Zhou, Malveeka Tewari, Min Zhu, Abdul Kabbani, et al. WCMP: Weighted Cost Multipathing for Improved Fairness in Data Centers. In ACM EuroSys, pages 1--14, 2014.Google ScholarGoogle Scholar

Index Terms

  1. QCMP: Load Balancing via In-Network Reinforcement Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FIRA '23: Proceedings of the 2nd ACM SIGCOMM Workshop on Future of Internet Routing & Addressing
          September 2023
          44 pages
          ISBN:9798400702761
          DOI:10.1145/3607504

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 September 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          FIRA '23 Paper Acceptance Rate6of9submissions,67%Overall Acceptance Rate6of9submissions,67%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader