skip to main content
10.1145/2465351.2465386acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Omega: flexible, scalable schedulers for large compute clusters

Published: 15 April 2013 Publication History

Abstract

Increasing scale and the need for rapid response to changing requirements are hard to meet with current monolithic cluster scheduler architectures. This restricts the rate at which new features can be deployed, decreases efficiency and utilization, and will eventually limit cluster growth. We present a novel approach to address these needs using parallelism, shared state, and lock-free optimistic concurrency control.
We compare this approach to existing cluster scheduler designs, evaluate how much interference between schedulers occurs and how much it matters in practice, present some techniques to alleviate it, and finally discuss a use case highlighting the advantages of our approach -- all driven by real-life Google production workloads.

References

[1]
Adaptive Computing Enterprises Inc. Maui Scheduler Administrator's Guide, 3.2 ed. Provo, UT, 2011.
[2]
Adl-Tabatabai, A.-R., Lewis, B. T., Menon, V., Murphy, B. R., Saha, B., and Shpeisman, T. Compiler and runtime support for efficient software transactional memory. In Proceedings of PLDI (2006), pp. 26--37.
[3]
Ananthanarayanan, G., Douglas, C., Ramakrishnan, R., Rao, S., and Stoica, I. True elasticity in multitenant data-intensive compute clusters. In Proceedings of SoCC (2012), p. 24.
[4]
Apache. Hadoop On Demand. http://goo.gl/px8Yd, 2007. Accessed 20/06/2012.
[5]
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems 26, 2 (June 2008), 4:1--4:26.
[6]
Chen, Y., Alspaugh, S., Borthakur, D., and Katz, R. Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In Proceedings of EuroSys (2012).
[7]
Chen, Y., Ganapathi, A. S., Griffith, R., and Katz, R. H. Design insights for MapReduce from diverse production workloads. Tech. Rep. UCB/EECS-2012-17, UC Berkeley, Jan. 2012.
[8]
Dean, J., and Ghemawat, S. MapReduce: Simplified data processing on large clusters. CACM 51, 1 (2008), 107--113.
[9]
Engler, D. R., Kaashoek, M. F., and O'Toole, Jr., J. Exokernel: an operating system architecture for application-level resource management. In Proceedings of SOSP (1995), pp. 251--266.
[10]
Ferguson, A. D., Bodik, P., Kandula, S., Boutin, E., and Fonseca, R. Jockey: guaranteed job latency in data parallel clusters. In Proceedings of EuroSys (2012), pp. 99--112.
[11]
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., and Stoica, I. Dominant resource fairness: fair allocation of multiple resource types. In Proceedings of NSDI (2011), pp. 323--336.
[12]
Herodotou, H., Dong, F., and Babu, S. No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In Proceedings of SoCC (2011).
[13]
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A., Katz, R., Shenker, S., and Stoica, I. Mesos: a platform for fine-grained resource sharing in the data center. In Proceedings of NSDI (2011).
[14]
Iqbal, S., Gupta, R., and Fang, Y.-C. Planning considerations for job scheduling in HPC clusters. Dell Power Solutions (Feb. 2005).
[15]
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: fair scheduling for distributed computing clusters. In Proceedings of SOSP (2009).
[16]
Jackson, D. and Snell, Q. and Clement, M. Core algorithms of the Maui scheduler. In Job Scheduling Strategies for Parallel Processing. 2001, pp. 87--102.
[17]
Kavulya, S., Tan, J., Gandhi, R., and Narasimhan, P. An analysis of traces from a production MapReduce cluster. In Proceedings of CCGrid (2010), pp. 94--103.
[18]
Kung, H. T., and Robinson, J. T. On optimistic methods for concurrency control. ACM Transactions on Database Systems 6, 2 (June 1981), 213--226.
[19]
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph processing. In Proceedings of SIGMOD (2010), pp. 135--146.
[20]
Mishra, A. K., Hellerstein, J. L., Cirne, W., and Das, C. R. Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Performance Evaluation Review 37 (Mar. 2010), 34--41.
[21]
Murthy, A. C., Douglas, C., Konar, M., O'Malley, O., Radia, S., Agarwal, S., and K V, V. Architecture of next generation Apache Hadoop MapReduce framework. Tech. rep., Apache Hadoop, 2011.
[22]
Pan, H., Hindman, B., and Asanović, K. Lithe: enabling efficient composition of parallel libraries. In Proceedings of HotPar (2009).
[23]
Peng, D., and Dabek, F. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of OSDI (2010).
[24]
Reiss, C., Tumanov, A., Ganger, G. R., Katz, R. H., and Kozuch, M. A. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In Proceedings of SoCC (2012).
[25]
Sharma, B., Chudnovsky, V., Hellerstein, J., Rifaat, R., and Das, C. Modeling and synthesizing task placement constraints in Google compute clusters. In Proceedings of SoCC (2011).
[26]
Verma, A., Cherkasova, L., and Campbell, R. SLO-driven right-sizing and resource provisioning of MapReduce jobs. In Proceedings of LADIS (2011).
[27]
Wilkes, J. More Google cluster data. Google research blog, Nov. 2011. Posted at http://goo.gl/9B7PA.
[28]
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., and Stoica, I. Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of EuroSys (2010), pp. 265--278.
[29]
Zhang, Q., Hellerstein, J., and Boutaba, R. Characterizing task usage shapes in Google's compute clusters. In Proceedings of LADIS (2011).

Cited By

View all
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 30-Mar-2025
  • (2025)AMoCNA operator: a Kubernetes operator pattern that enhances cloud-native execution environments with autonomic featuresThe Journal of Supercomputing10.1007/s11227-024-06855-781:4Online publication date: 18-Feb-2025
  • (2025)FaaS-Utility: Tackling FaaS Cold Starts with User-Preference and QoS-Driven PricingEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-031-81226-2_5(43-57)Online publication date: 6-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '13: Proceedings of the 8th ACM European Conference on Computer Systems
April 2013
401 pages
ISBN:9781450319942
DOI:10.1145/2465351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 April 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster scheduling
  2. optimistic concurrency control

Qualifiers

  • Research-article

Conference

EuroSys '13
Sponsor:
EuroSys '13: Eighth Eurosys Conference 2013
April 15 - 17, 2013
Prague, Czech Republic

Acceptance Rates

EuroSys '13 Paper Acceptance Rate 28 of 143 submissions, 20%;
Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)30
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud PlatformsProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707226(164-181)Online publication date: 30-Mar-2025
  • (2025)AMoCNA operator: a Kubernetes operator pattern that enhances cloud-native execution environments with autonomic featuresThe Journal of Supercomputing10.1007/s11227-024-06855-781:4Online publication date: 18-Feb-2025
  • (2025)FaaS-Utility: Tackling FaaS Cold Starts with User-Preference and QoS-Driven PricingEconomics of Grids, Clouds, Systems, and Services10.1007/978-3-031-81226-2_5(43-57)Online publication date: 6-Feb-2025
  • (2024)AutothrottleProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691834(149-165)Online publication date: 16-Apr-2024
  • (2024)Making kernel bypass practical for the cloud with junctionProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691829(55-73)Online publication date: 16-Apr-2024
  • (2024)Leveraging Reinforcement Learning for Autonomous Data Pipeline Optimization and ManagementSSRN Electronic Journal10.2139/ssrn.4908414Online publication date: 2024
  • (2024)Cloud-native Workflow Scheduling using a Hybrid Priority Rule, Dynamic Resource Allocation, and Dynamic Task PartitionProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698551(830-846)Online publication date: 20-Nov-2024
  • (2024)Dynamic Idle Resource Leasing To Safely Oversubscribe Capacity At MetaProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698537(792-810)Online publication date: 20-Nov-2024
  • (2024)Scheduling for Reduced Tail Task Latencies in Highly Utilized DatacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698522(302-321)Online publication date: 20-Nov-2024
  • (2024)Dirigent: Lightweight Serverless OrchestrationProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695966(369-384)Online publication date: 4-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media