research-article

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

Authors:
Wu Xiang

ByteDance Inc.

ByteDance Inc.

0009-0002-6733-0947
View Profile

,
Yakun Li

ByteDance Inc.

ByteDance Inc.

0009-0003-2921-8310
View Profile

,
Yuquan Ren

ByteDance Inc.

ByteDance Inc.

0009-0003-0015-5731
View Profile

,
Fan Jiang

ByteDance Inc.

ByteDance Inc.

0009-0003-1304-9292
View Profile

,
Chaohui Xin

ByteDance Inc.

ByteDance Inc.

0009-0009-2488-8538
View Profile

,
Varun Gupta

ByteDance Inc.

ByteDance Inc.

0009-0003-9192-6692
View Profile

,
Chao Xiang

ByteDance Inc.

ByteDance Inc.

0009-0004-0510-8958
View Profile

,
Xinyi Song

ByteDance Inc.

ByteDance Inc.

0009-0005-0107-580X
View Profile

,
Meng Liu

ByteDance Inc.

ByteDance Inc.

0009-0005-1720-5349
View Profile

,
Bing Li

ByteDance Inc.

ByteDance Inc.

0009-0002-7817-5479
View Profile

,
Kaiyang Shao

ByteDance Inc.

ByteDance Inc.

0009-0002-9469-8077
View Profile

,
Chen Xu

ByteDance Inc.

ByteDance Inc.

0009-0008-3070-3395
View Profile

,
Wei Shao

ByteDance Inc.

ByteDance Inc.

0009-0008-9555-7704
View Profile

,
Yuqi Fu

University of Virginia and ByteDance

University of Virginia and ByteDance

0009-0009-6556-9162
View Profile

,
Wilson Wang

ByteDance Inc.

ByteDance Inc.

0009-0009-6800-8061
View Profile

,
Cong Xu

ByteDance Inc.

ByteDance Inc.

0009-0005-5657-3291
View Profile

,
Wei Xu

ByteDance Inc.

ByteDance Inc.

0009-0003-3955-2021
View Profile

,
Caixue Lin

ByteDance Inc.

ByteDance Inc.

0009-0003-0570-6352
View Profile

,
Rui Shi

ByteDance Inc.

ByteDance Inc.

0009-0003-9122-4703
View Profile

,
Yuming Liang

ByteDance Inc.

ByteDance Inc.

0009-0008-3038-6158
View Profile

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud ComputingOctober 2023Pages 308–323https://doi.org/10.1145/3620678.3624663

Published:31 October 2023Publication History

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing

Pages 308–323

ABSTRACT

Over the last few years, at ByteDance, our compute infrastructure scale has been expanding significantly due to expedited business growth. In this journey, to meet hyper-scale growth, some business groups resorted to managing their own compute infrastructure stack running different scheduling systems such as Kubernetes, YARN which created two major pain points: the increasing resource fragmentation across different business groups and the inadequate resource elasticity between workloads of different business priorities. Isolation across different business groups (and their compute infrastructure management) leads to inefficient compute resource utilization and prevents us from serving the business growth needs in the long run.

To meet these challenges, we propose a resource management and scheduling system named Gödel, which provides a unified compute infrastructure for all business groups to run their diverse workloads under a unified resource pool. It co-locates various workloads on every machine to achieve better resource utilization and elasticity. Gödel is built upon Kubernetes, the de facto open-source container orchestration system, but with significant components replaced or enhanced to accommodate various workloads at a large scale. In production, it manages clusters with tens of thousands of machines, achieves high overall resource utilization of over 60%, and scheduling throughput of up to 5000 pods per second. This paper reports on our design and implementation with Gödel. Moreover, it discusses the lessons and best practices we learned in developing and operating it in production at ByteDance's scale.

References

etcd. https://etcd.io/.Google Scholar
Kansible kubemark. https://github.com/fabric8io/kansible/blob/master/vendor/k8s.io/kubernetes/docs/devel/kubemark-guide.md.Google Scholar
Katalyst. https://github.com/kubewharf/katalyst-core.Google Scholar
Kube-batch. https://github.com/kubernetes-sigs/kube-batch.Google Scholar
Kubebrain. https://github.com/kubewharf/kubebrain.Google Scholar
Kubernetes. https://kubernetes.io/.Google Scholar
Nomad project. https://www.nomadproject.io/.Google Scholar
Volcano. https://github.com/volcano-sh/volcano.Google Scholar
Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for {Cloud-Scale} computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285--300, 2014.Google Scholar
Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of the Seventh ACM Symposium on Cloud Computing, pages 497--509, 2016.Google ScholarDigital Library
Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. Hawk: Hybrid datacenter scheduling. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 499--510, 2015.Google Scholar
Christina Delimitrou and Christos Kozyrakis. Paragon: Qos-aware scheduling for heterogeneous datacenters. ACM SIGPLAN Notices, 48(4):77--88, 2013.Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. Quasar: Resource-efficient and qos-aware cluster management. ACM SIGPLAN Notices, 49(4):127--144, 2014.Google ScholarDigital Library
Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. Tarcil: Reconciling scheduling speed and quality in large shared clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing, pages 97--110, 2015.Google ScholarDigital Library
Yihui Feng, Zhi Liu, Yunjian Zhao, Tatiana Jin, Yidi Wu, Yang Zhang, James Cheng, Chao Li, and Tao Guan. Scaling large production clusters with partitioned synchronization. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 81--97, 2021.Google Scholar
Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, and Sriram Rao. Medea: scheduling of long running applications in shared production clusters. In Proceedings of the thirteenth EuroSys conference, pages 1--13, 2018.Google ScholarDigital Library
Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. Dominant resource fairness: Fair allocation of multiple resource types. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011.Google ScholarDigital Library
Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. Firmament: Fast, centralized cluster scheduling at scale. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 99--115, 2016.Google ScholarDigital Library
Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in {Multi-Resource} clusters. In 12th USENIX symposium on operating systems design and implementation (OSDI 16), pages 65--80, 2016.Google Scholar
Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy Katz, Scott Shenker, and Ion Stoica. Mesos: A platform for {Fine-Grained} resource sharing in the data center. In 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), 2011.Google Scholar
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 261--276, 2009.Google ScholarDigital Library
Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15), pages 485--497, 2015.Google Scholar
Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication, pages 270--288. 2019.Google ScholarDigital Library
Shanka Subhra Mondal, Nikhil Sheoran, and Subrata Mitra. Scheduling of time-varying workloads using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9000--9008, 2021.Google ScholarCross Ref
Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. Sparrow: distributed, low latency scheduling. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 69--84, 2013.Google ScholarDigital Library
Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th ACM European Conference on Computer Systems, pages 351--364, 2013.Google ScholarDigital Library
Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan, Jonathan Kaldor, Scott Michelson, Thawan Kooburat, Aravind Anbudurai, Matthew Clark, Kabir Gogia, Long Cheng, et al. Twine: A unified cluster management system for shared infrastructure. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 787--803, 2020.Google Scholar
Prashanth Thinakaran, Jashwant Raj Gunasekaran, Bikash Sharma, Mahmut Taylan Kandemir, and Chita R Das. Phoenix: A constraint-aware scheduler for heterogeneous datacenters. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 977--987. IEEE, 2017.Google ScholarCross Ref
Muhammad Tirmazi, Adam Barker, Nan Deng, Md E Haque, Zhijing Gene Qin, Steven Hand, Mor Harchol-Balter, and John Wilkes. Borg: the next generation. In Proceedings of the fifteenth European conference on computer systems, pages 1--14, 2020.Google ScholarDigital Library
Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. Tetrisched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the Eleventh European Conference on Computer Systems, pages 1--16, 2016.Google ScholarDigital Library
Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, et al. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, pages 1--16, 2013.Google ScholarDigital Library
Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at google with borg. In roceedings of the Tenth European Conference on Computer Systems, pages 1--17, 2015.Google Scholar
Yanqi Zhang, Weizhe Hua, Zhuangzhuang Zhou, G Edward Suh, and Christina Delimitrou. Sinan: Ml-based and qos-aware resource management for cloud microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 167--181, 2021.Google ScholarDigital Library
Zhuo Zhang, Chao Li, Yangyu Tao, Renyu Yang, Hong Tang, and Jie Xu. Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. In Proceedings of the VLDB Endowment, volume 7, pages 1393--1404. VLDB Endowment Inc., 2014.Google Scholar
Wei Zhou, K Preston White, and Hongfeng Yu. Improving short job latency performance in hybrid job schedulers with dice. In Proceedings of the 48th International Conference on Parallel Processing, pages 1--10, 2019.Google ScholarDigital Library

Index Terms

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing

Recommendations

VMCTune: A Load Balancing Scheme for Virtual Machine Cluster Using Dynamic Resource Allocation
GCC '10: Proceedings of the 2010 Ninth International Conference on Grid and Cloud Computing

This paper designs and implements a load balancing scheme based on dynamic resource allocation policy for virtual machine cluster, which are running under para-virtualization mode on a cluster of physical machines (PM) in shared storage architecture. It ...
Read More
Enterprise Resource Management in Mesos Clusters
SYSTOR '16: Proceedings of the 9th ACM International on Systems and Storage Conference

Enterprise data centers increasingly adopt a cloud-like architecture that enables the execution of multiple workloads on a shared pool of resources, reduces the data center footprint and drives down the costs. A number of cluster resource managers have ...
Read More
A Constrained Genetic Algorithm for Rebalancing of Services in Cloud Data Centers
CLOUD '15: Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing

In Infrastructure-as-a-Service cloud data centers, services are provided to cloud customers in the form of virtual machines. Cloud customers can place restrictions on these services by specifying affinity and anti-affinity constraints. Load imbalance is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing
October 2023
624 pages
ISBN:9798400703874
DOI:10.1145/3620678

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cluster
Resource Management
Schedule
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 533
  Total Downloads
- Downloads (Last 12 months)533
- Downloads (Last 6 weeks)57
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

VMCTune: A Load Balancing Scheme for Virtual Machine Cluster Using Dynamic Resource Allocation

Enterprise Resource Management in Mesos Clusters

A Constrained Genetic Algorithm for Rebalancing of Services in Cloud Data Centers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gödel: Unified Large-Scale Resource Management and Scheduling at ByteDance

SoCC '23: Proceedings of the 2023 ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

VMCTune: A Load Balancing Scheme for Virtual Machine Cluster Using Dynamic Resource Allocation

Enterprise Resource Management in Mesos Clusters

A Constrained Genetic Algorithm for Rebalancing of Services in Cloud Data Centers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media