skip to main content
10.1145/3342195.3387551acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Improving resource utilization by timely fine-grained scheduling

Published:17 April 2020Publication History

ABSTRACT

Monotask is a unit of work that uses only a single type of resource (e.g., CPU, network, disk I/O). While monotask was primarily introduced as a means to reason about job performance, in this paper we show that this fine-grained, resource-oriented abstraction can be leveraged by job schedulers to maximize cluster resource utilization. Although recent cluster schedulers have significantly improved resource allocation, the utilization of the allocated resources is often not high due to inaccurate resource requests. In particular, we show that existing scheduling mechanisms are ineffective for handling jobs with dynamic resource usage, which exists in common workloads, and propose a resource negotiation mechanism between job schedulers and executors that makes use of monotasks. We design a new framework, called Ursa, which enables the scheduler to capture accurate resource demands dynamically from the execution runtime and to provide timely, fine-grained resource allocation based on monotasks. Ursa also enables high utilization of the allocated resources by the execution runtime. We show by experiments that Ursa is able to improve cluster resource utilization, which effectively translates to improved makespan and average JCT.

References

  1. Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2019. A Large Time-Aware Graph. http://law.di.unimi.it/webdata/uk-union-2006-06-2007-05/Google ScholarGoogle Scholar
  2. Léon Bottou. 2019. MNIST8M - The infinite MNIST dataset. https://leon.bottou.org/projects/infimnistGoogle ScholarGoogle Scholar
  3. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 285--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38. http://sites.computer.org/debull/A15dec/p28.pdfGoogle ScholarGoogle Scholar
  5. Carlos Castillo. 2019. Datasets for Research on Web Spam Detection. http://chato.cl/webspam/datasets/Google ScholarGoogle Scholar
  6. Mosharaf Chowdhury, Zhenhua Liu, Ali Ghodsi, and Ion Stoica. 2016. HUG: Multi-Resource Fairness for Correlated and Elastic Demands. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16). 407--424.Google ScholarGoogle Scholar
  7. Cloudera. 2015. How-to: Tune Your Apache Spark Jobs. https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/Google ScholarGoogle Scholar
  8. Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP 17). ACM, 153--167.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pamela Delgado, Diego Didona, Florin Dinu, and Willy Zwaenepoel. 2016. Job-aware scheduling in eagle: Divide and stick to your probes. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC 16). ACM, 497--509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, and Willy Zwaenepoel. 2015. Hawk: Hybrid datacenter scheduling. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 499--510.Google ScholarGoogle Scholar
  11. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 13), Vol. 48. ACM, 77--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: resource-efficient and QoS-aware cluster management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 14), Vol. 42. ACM, 127--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: reconciling scheduling speed and quality in large shared clusters. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC 15). ACM, 97--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and Ion Stoica. 2011. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types.. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), Vol. 11. 24--24.Google ScholarGoogle Scholar
  15. Ionel Gog, Malte Schwarzkopf, Adam Gleave, Robert NM Watson, and Steven Hand. 2016. Firmament: Fast, centralized cluster scheduling at scale. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 99--115.Google ScholarGoogle Scholar
  16. Google. 2019. Production-Grade Container Orchestration - Kubernetes. https://kubernetes.io/Google ScholarGoogle Scholar
  17. Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. 2015. Multi-resource packing for cluster schedulers. In Proceedings of the ACM SIGCOMM Computer Communication Review (SIGCOMM 15), Vol. 44. ACM, 455--466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. 2016. Altruistic scheduling in multi-resource clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 65--80.Google ScholarGoogle Scholar
  19. Robert Grandl, Srikanth Kandula, Sriram Rao, Aditya Akella, and Janardhan Kulkarni. 2016. GRAPHENE: Packing and Dependency-Aware Scheduling for Data-Parallel Clusters. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 81--97.Google ScholarGoogle Scholar
  20. Apache Hadoop. 2019. Hadoop: Capacity Scheduler. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.htmlGoogle ScholarGoogle Scholar
  21. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D Joseph, Randy H Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center.. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI 11), Vol. 11. 22--22.Google ScholarGoogle Scholar
  22. Yuzhen Huang, Yingjie Shi, Zheng Zhong, Yihui Feng, James Cheng, Jiwei Li, Haochuan Fan, Chao Li, Tao Guan, and Jingren Zhou. 2019. Yugong: Geo-Distributed Data and Job Placement at Scale. PVLDB 12, 12 (2019), 2155--2169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yuzhen Huang, Xiao Yan, Guanxian Jiang, Tatiana Jin, James Cheng, An Xu, Zhanhao Liu, and Shuo Tu. 2019. Tangram: Bridging Immutable and Mutable Abstractions for Distributed Data Analytics. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), Dahlia Malkhi and Dan Tsafrir (Eds.). USENIX Association, 191--206. https://www.usenix.org/conference/atc19/presentation/huangGoogle ScholarGoogle Scholar
  24. Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: fair scheduling for distributed computing clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP 09). ACM, 261--276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Prajakta Kalmegh and Shivnath Babu. 2019. MIFO: A Query-Semantic Aware Resource Allocation Policy. In Proceedings of the 2019 ACM International Conference on Management of Data (SIGMOD 19). ACM, 1678--1695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Konstantinos Karanasos, Sriram Rao, Carlo Curino, Chris Douglas, Kishore Chaliparambil, Giovanni Matteo Fumarola, Solom Heddaya, Raghu Ramakrishnan, and Sarvesh Sakalanaga. 2015. Mercury: Hybrid centralized and distributed scheduling in large shared clusters. In 2015 USENIX Annual Technical Conference (USENIX ATC 15). 485--497.Google ScholarGoogle Scholar
  27. James E Kelley Jr and Morgan R Walker. 1959. Critical-path planning and scheduling. In Papers presented at the December 1--3, 1959, eastern joint IRE-AIEE-ACM computer conference. ACM, 160--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Marcel Kornacker, Alexander Behm, Victor Bittorf, Taras Bobrovytsky, Casey Ching, Alan Choi, Justin Erickson, Martin Grund, Daniel Hecht, Matthew Jacobs, Ishaan Joshi, Lenni Kuff, Dileep Kumar, Alex Leblang, Nong Li, Ippokratis Pandis, Henry Robinson, David Rorke, Silvius Rus, John Russell, Dimitris Tsirogiannis, Skye Wanderman-Milne, and Michael Yoder. 2015. Impala: A Modern, Open-Source SQL Engine for Hadoop.. In Proceedings of the 7th Biennial Conference on Innovative Data Systems (CIDR 15), Vol. 1. 9.Google ScholarGoogle Scholar
  29. Kubernetes. 2015. Taking network bandwidth into account for scheduling. https://github.com/kubernetes/kubernetes/issues/16837Google ScholarGoogle Scholar
  30. Haoyuan Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica. 2014. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 14). ACM, 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Libin Liu and Hong Xu. 2018. Elasecutor: Elastic Executor Scheduling in Data Analytics Systems. In Proceedings of the ACM Symposium on Cloud Computing (SoCC 18). ACM, 107--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: interactive analysis of web-scale datasets. Proceedings of the 36th International Conference on Very Large Data Bases (VLDB 10) 3, 1--2 (2010), 330--339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kay Ousterhout, Christopher Canel, Sylvia Ratnasamy, and Scott Shenker. 2017. Monotasks: Architecting for performance clarity in data analytics frameworks. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP 17). ACM, 184--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kay Ousterhout, Patrick Wendell, Matei Zaharia, and Ion Stoica. 2013. Sparrow: distributed, low latency scheduling. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP 13). ACM, 69--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jeff Rasley, Konstantinos Karanasos, Srikanth Kandula, Rodrigo Fonseca, Milan Vojnovic, and Sriram Rao. 2016. Efficient queue management for cluster scheduling. In Proceedings of the 11th European Conference on Computer Systems (EuroSys 16). ACM, 36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bikas Saha, Hitesh Shah, Siddharth Seth, Gopal Vijayaraghavan, Arun Murthy, and Carlo Curino. 2015. Apache tez: A unifying framework for modeling and building data processing applications. In Proceedings of the 2015 ACM International Conference on Management of Data (SIGMOD 15). ACM, 1357--1369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: flexible, scalable schedulers for large compute clusters. In Proceedings of the 8th European Conference on Computer Systems (EuroSys 13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaoyang Sun, Chunming Hu, Renyu Yang, Peter Garraghan, Tianyu Wo, Jie Xu, Jianyong Zhu, and Chao Li. 2018. ROSE: Cluster Resource Scheduling via Speculative Over-Subscription. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS 18). IEEE, 949--960.Google ScholarGoogle ScholarCross RefCross Ref
  39. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: a warehousing solution over a map-reduce framework. Proceedings of the 35th International Conference on Very Large Data Bases (VLDB 09) 2, 2 (2009), 1626--1629.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. TPC-H. 2019. Decision support benchmark. http://www.tpc.org/tpch/Google ScholarGoogle Scholar
  41. Alexey Tumanov, Timothy Zhu, Jun Woo Park, Michael A Kozuch, Mor Harchol-Balter, and Gregory R Ganger. 2016. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters. In Proceedings of the 11th European Conference on Computer Systems (EuroSys 16). ACM, 35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Saha Bikas, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, and Eric Baldeschwieler. 2013. Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (SoCC 13). ACM, 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the 10th European Conference on Computer Systems (EuroSys 15). ACM, 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Markus Weimer, Yingda Chen, Byung-Gon Chun, Tyson Condie, Carlo Curino, Chris Douglas, Yunseong Lee, Tony Majestro, Dahlia Malkhi, Sergiy Matusevych, Brandon Myers, Shravan M. Narayanamurthy, Raghu Ramakrishnan, Sriram Rao, Russell Sears, Beysim Sezgin, and Julia Wang. 2013. Reef: Retainable evaluator execution framework. Proceedings of the 39th International Conference on Very Large Data Bases (VLDB 13) 6, 12 (2013), 1370--1373.Google ScholarGoogle Scholar
  45. Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data 1, 2 (2015), 49--67.Google ScholarGoogle ScholarCross RefCross Ref
  46. Fan Yang, Jinfeng Li, and James Cheng. 2016. Husky: Towards a More Efficient and Expressive Distributed Computing Framework. PVLDB 9, 5 (2016), 420--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems 42, 1 (2015), 181--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yi Yao, Han Gao, Jiayin Wang, Ningfang Mi, and Bo Sheng. 2016. OpERA: opportunistic and efficient resource allocation in Hadoop YARN by harnessing idle resources. In Proceedings of the 25th International Conference on Computer Communication and Networks (ICCCN 16). IEEE, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  49. Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. 2010. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 5th European conference on Computer systems (EuroSys 10). ACM, 265--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, 2--2.Google ScholarGoogle Scholar
  51. Xiaoda Zhang, Zhuzhong Qian, Sheng Zhang, Xiangbo Li, Xiaoliang Wang, and Sanglu Lu. 2018. COBRA: Toward Provably Efficient Semi-Clairvoyant Scheduling in Data Analytics Systems. In 2018 IEEE Conference on Computer Communications (IEEE INFOCOM). IEEE, 513--521.Google ScholarGoogle Scholar
  52. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 301--316.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    EuroSys '20: Proceedings of the Fifteenth European Conference on Computer Systems
    April 2020
    49 pages
    ISBN:9781450368827
    DOI:10.1145/3342195

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 17 April 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    EuroSys '20 Paper Acceptance Rate43of234submissions,18%Overall Acceptance Rate241of1,308submissions,18%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader