Abstract
In multi-resource clusters, many schedulers allocate resources based on fixed quantities. However, fixed allocations can easily lead to resource fragmentation and over-commitment problems, which may result in lower resource utilization and performance degradation. This paper proposes a fine-grained method (FGM) to improve the allocation granularity of resource allocation. This method divides tasks into execution stages according to the task requirement estimated using similar tasks at the runtime. Then, task resource requirements are matched with the available server resources by stages to refine two aspects of allocation granularity: allocation duration and allocation quantity. In addition, the FGM may over-allocate resources deliberately to further improve resource utilization and performance. The paper tested the FGM in three environments using both online and offline workloads. The test results show that the FGM can resolve resource fragmentation and over-commitment problems by significantly improving resource utilization and performance with acceptable fairness and scheduling response times.
Similar content being viewed by others
References
Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the 3rd ACM Symposium on Cloud Computing, pp 7–19
Staples G (2006) TORQUE resource manager. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing
Capacity Scheduler. https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html. Accessed 14 July 2017
Zaharia M, Borthakur D, Sarma JS, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley, Technical report UCB/EECS-2009-55
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th ACM European Conference on Computer Systems, pp 265–278
Zaharia M, Konwinski A, Joseph AD, Katz RH, Stoica I (2008) Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th Symposium on Operating Systems Design and Implementation, vol 8, pp 29–42
Apache Hadoop. http://hadoop.apache.org/. Accessed 14 July 2017
Ousterhout K, Wendell P, Zaharia M, Stoica I (2013) Sparrow: distributed, low latency scheduling. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles, pp 69–84
Reiss C, Tumanov A, Ganger GR, Katz RH, Kozuch MA (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel Science and Technology Center for Cloud Computing, Technical report ISTC-CC-TR-12-101
Abdul-Rahman OA, Aida K (2014) Towards understanding the usage behavior of Google cloud users: the mice and elephants phenomenon. In: Proceedings of the 6th IEEE International Conference on Cloud Computing Technology and Science, pp 272–277
Di S, Kondo D, Cappello F (2013) Characterizing cloud applications on a Google data center. In: Proceedings of the 42th IEEE International Conference on Parallel Processing, pp 468–473
Boutin E, Ekanayake J, Lin W, Shi B, Zhou J, Qian Z, Wu M, Zhou L (2014) Apollo: scalable and coordinated scheduling for cloud-scale computing. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, pp 285–300
Schwarzkopf M, Konwinski A, Abd-El-Malek M, Wilkes J (2013) Omega: exible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 351–364
Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A (2014) Multi-resource packing for cluster schedulers. In: Proceedings of the ACM Conference on SIGCOMM, pp 455–466
Lu P, Lee YC, Wang C, Zhou BB, Chen J, Zomaya AY (2012) Workload characteristic oriented scheduler for MapReduce. In: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, pp 156–163
Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 8th IEEE International Conference on Grid and Cooperative Computing, pp 218–224
Tang Z, Liu M, Ammar A, Li K, Li K (2016) An optimized MapReduce work ow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):2059–2079
Dean J, Barroso LA (2013) The tail at scale. Commun ACM 56(2):74–80
Garraghan P, Ouyang X, Yang R, McKee D, Xu J (2018) Straggler root-cause and impact analysis for massive-scale virtualized cloud datacenters. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2016.2611578
Ghodsi A, Zaharia M, Hindman B, Konwinski A, Shenker S, Stoica I (2011) Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, pp 232–336
Grandl R, Chowdhury M, Akella A, Ananthanarayanan G (2016) Altruistic scheduling in multi-resource clusters. In: Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp 65–80
Vavilapalli VK et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th ACM Symposium on Cloud Computing, pp 1–16
Zhang Z, Li C, Tao Y, Yang R, Tang H, Xu J (2014) Fuxi: a fault-tolerant resource management and job scheduling system at internet scale. Proc VLDB Endow 7(13):1393–1404
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at Google with Borg. In: Proceedings of the 10th ACM European Conference on Computer Systems, pp 1–17
Jain R, Chiu DM, Hawe WR (1984) A quantitative measure of fairness and discrimination for resource allocation in shared computer system. Technical report DEC-TR-301
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph AD, Katz R, Shenker S, Stoica I (2011) Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, pp 295–308
Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters. In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles, pp 261–276
Gog I, Schwarzkopf M, Gleave A, Watson RN, Hand S (2016) Firmament: fast, centralized cluster scheduling at scale. In: Proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation, pp 99–115
Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, pp 265–278
Kung HT, Robinson JT (1981) On optimistic methods for concurrency control. ACM Trans Database Syst 6(2):213–226
Ghodsi A, Zaharia M, Shenker S, Stoica I (2013) Choosy: max–min fair sharing for datacenter jobs with constraints. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 365–378
Lee YH, Huang KC, Shieh MR, Lai KC (2017) Distributed resource allocation in federated clouds. J Supercomput 73(7):3196–3211
AlEbrahim S, Ahmad I (2017) Task scheduling for heterogeneous computing systems. J Supercomput 73(6):2313–2338
Agarwal S, Kandula S, Bruno N, Wu MC, Stoica I, Zhou J (2012) Re-optimizing data-parallel computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp 281–294
Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 99–112
Khan M, Jin Y, Li M, Xiang Y, Jiang C (2016) Hadoop performance modeling for job estimation and resource provisioning. IEEE Trans Parallel Distrib Syst 27(2):441–454
Ananthanarayanan G, Ghodsi A, Wang A, Borthakur D, Kandula S, Shenker S, Stoica I (2012) Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, pp 267–280
Morton K, Balazinska M, Grossman D (2010) ParaTimer: a progress indicator for MapReduce DAGs. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp 507–518
Zhang X, Tune E, Hagmann R, Jnagal R, Gokhale V, Wilkes J (2013) CPI2: CPU performance isolation for shared compute clusters. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 379–391
Acknowledgements
This work was supported by the National Key Research and Development Program of China (No. 2016YFB0200902 to X. Zhang); and the National Natural Science Foundation of China (No. 61572394 to X. Dong).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, M., Dong, X., Chen, H. et al. Fine-grained scheduling in multi-resource clusters. J Supercomput 76, 1931–1958 (2020). https://doi.org/10.1007/s11227-018-2505-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2505-4