skip to main content
10.1145/3620678.3624663acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

Authors Info & Claims
Published:31 October 2023Publication History

ABSTRACT

Over the last few years, at ByteDance, our compute infrastructure scale has been expanding significantly due to expedited business growth. In this journey, to meet hyper-scale growth, some business groups resorted to managing their own compute infrastructure stack running different scheduling systems such as Kubernetes, YARN which created two major pain points: the increasing resource fragmentation across different business groups and the inadequate resource elasticity between workloads of different business priorities. Isolation across different business groups (and their compute infrastructure management) leads to inefficient compute resource utilization and prevents us from serving the business growth needs in the long run.

To meet these challenges, we propose a resource management and scheduling system named Gödel, which provides a unified compute infrastructure for all business groups to run their diverse workloads under a unified resource pool. It co-locates various workloads on every machine to achieve better resource utilization and elasticity. Gödel is built upon Kubernetes, the de facto open-source container orchestration system, but with significant components replaced or enhanced to accommodate various workloads at a large scale. In production, it manages clusters with tens of thousands of machines, achieves high overall resource utilization of over 60%, and scheduling throughput of up to 5000 pods per second. This paper reports on our design and implementation with Gödel. Moreover, it discusses the lessons and best practices we learned in developing and operating it in production at ByteDance's scale.

References

  1. etcd. https://etcd.io/.Google ScholarGoogle Scholar
  2. Kansible kubemark. https://github.com/fabric8io/kansible/blob/master/vendor/k8s.io/kubernetes/docs/devel/kubemark-guide.md.Google ScholarGoogle Scholar
  3. Katalyst. https://github.com/kubewharf/katalyst-core.Google ScholarGoogle Scholar
  4. Kube-batch. https://github.com/kubernetes-sigs/kube-batch.Google ScholarGoogle Scholar
  5. Kubebrain. https://github.com/kubewharf/kubebrain.Google ScholarGoogle Scholar
  6. Kubernetes. https://kubernetes.io/.Google ScholarGoogle Scholar
  7. Nomad project. https://www.nomadproject.io/.Google ScholarGoogle Scholar
  8. Volcano. https://github.com/volcano-sh/volcano.Google ScholarGoogle Scholar
  9. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for {Cloud-Scale} computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285--300, 2014.Google ScholarGoogle Scholar
  10. Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of the Seventh ACM Symposium on Cloud Computing, pages 497--509, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. Hawk: Hybrid datacenter scheduling. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 499--510, 2015.Google ScholarGoogle Scholar
  12. Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 48(4):77--88, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 49(4):127--144, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 97--110, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yihui Feng, Zhi Liu, Yunjian Zhao, Tatiana Jin, Yidi Wu, Yang Zhang, James Cheng, Chao Li, and Tao Guan. Scaling large production clusters with partitioned synchronization. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 81--97, 2021.Google ScholarGoogle Scholar
  16. Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. Medea: scheduling of long running applications in shared production clusters. In Proceedings of the thirteenth EuroSys conference, pages 1--13, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. Firmament: Fast, centralized cluster scheduling at scale. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99--115, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in {Multi-Resource} clusters. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 65--80, 2016.Google ScholarGoogle Scholar
  20. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for {Fine-Grained} resource sharing in the data center. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011.Google ScholarGoogle Scholar
  21. Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 261--276, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 485--497, 2015.Google ScholarGoogle Scholar
  23. Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication, pages 270--288. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shanka Subhra Mondal, Nikhil Sheoran, and Subrata Mitra. Scheduling of time-varying workloads using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9000--9008, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  25. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 69--84, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 351--364, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, et al. Twine: A unified cluster management system for shared infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 787--803, 2020.Google ScholarGoogle Scholar
  28. Prashanth Thinakaran, Jashwant Raj Gunasekaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das. Phoenix: A constraint-aware scheduler for heterogeneous datacenters. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 977--987. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  29. Muhammad Tirmazi, Adam Barker, Nan Deng, Md E Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. Borg: the next generation. In Proceedings of the fifteenth European conference on computer systems, pages 1--14, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. Tetrisched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the Eleventh European Conference on Computer Systems, pages 1--16, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, pages 1--16, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at google with borg. In roceedings of the Tenth European Conference on Computer Systems, pages 1--17, 2015.Google ScholarGoogle Scholar
  33. Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. Sinan: Ml-based and qos-aware resource management for cloud microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 167--181, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. In Proceedings of the VLDB Endowment, volume 7, pages 1393--1404. VLDB Endowment Inc., 2014.Google ScholarGoogle Scholar
  35. Wei Zhou, K Preston White, and Hongfeng Yu. Improving short job latency performance in hybrid job schedulers with dice. In Proceedings of the 48th International Conference on Parallel Processing, pages 1--10, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing
      October 2023
      624 pages
      ISBN:9798400703874
      DOI:10.1145/3620678

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 31 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate169of722submissions,23%
    • Article Metrics

      • Downloads (Last 12 months)533
      • Downloads (Last 6 weeks)57

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader