Abstract
Cluster computing is receiving exponential popularity as a choice for high performance computing. This is mainly due to its effective cost performance ratio. Resource management systems (RMS) are the key component to manage the resources of clusters efficiently and have a very vital role in the performance of distributed parallel systems especially a job scheduling module. In this paper, we have empirically evaluated four resource management systems (SGE, TORQUE, and MAUI Scheduler and SLURM) with special focus on job scheduler component. These schedulers have been evaluated on a more comprehensive set of metrics such as throughput, CPU, memory and network utilization. Experiments were carried out on three different size testbeds with a range of scheduler configurations such as FCFS, Backfilling, Fair share and SJF scheduling techniques.
A head-to-head comparison of different scheduling techniques has also been presented which highlights the effect of RMS on the performance of scheduling techniques. It has been observed from results that relative difference among the performance of scheduling techniques reached up to 63%. We conclude from the experiments that there is no single choice of RMS which can be identified as the best but SLURM performs better than others in most of the cases.
Similar content being viewed by others
References
http://www.advancedclustering.com/cms/types_of_clusters.html 19 May 17, 2008
http://www.top500.org/lists/2007, 19 May 17, 2008
http://www.clusterbuilder.org/pages/software/clustermiddleware/resource-manager.php
Baker, M.A., Fox, G.C., Yau, H.W.: Cluster computing review. Northeast Parallel Architectures Center, Syracuse University, Nov. (1995)
Jones, J.P.: NAS requirements checklist for job queuing/scheduling software. AS Technical Report NAS-96-003 April (1996)
Patton, J.: Evaluation of job queuing/scheduling software: phase 1 report. NAS Technical Report NAS-96-009, July (1996)
Byun, C., Duncan, C.: A comparison of job management systems in supporting HPC ClusterTools. Presentation for SUPerG Vancouver, Fall 2000 HES Engineering-HPC, Sun Microsystems, Inc. Stephanie Burks University Information Technology Services, Indiana University
El-Ghazawi, T., et al.: Conceptual comparative study of job management systems. A Report for the NSA LUCITE Task Order Productive Use of Distributed ReconFigurable Computing George Mason University, February 21 (2001)
Hassaine, O.: Issues in selecting a job management system. CPRE Engineering-HPC Sun BluePrints™OnLine, January (2002)
Imamagi, E., Radi, B., Dobreni, D.: Job management systems analysis. In: 6th CARNet users conference (2004)
El-Ghazawi, T., et al.: Experimental comparative study of job management systems. A report for the NSA LUCITE Task Order Productive Use of Distributed ReconFigurable Computing, July 20 (2001)
El-Ghazawi, T., Gaj, K., et al.: A performance study of job management systems. J. Concurr. Comput. Pract. Experience 16(13), 1229–1246 (2004)
Yan, Y., Chapman, B.: Comparative study of distributed management systems—SGE, LSF, PBS. Department of Computer Science University of Houston, “Torque Resource management system” Administration manual available http://www.clusterresources.com/products/torque/docs20/torqueadmin.shtml
Bode, B., Halstead, D.M., Kendall, R., Lei, Z.: PorTable Batch system and the MAUI scheduler on Linux clusters. In: The Proceedings of the 4th Annual Linux Showcase and Conference, Atlanta, October 10–14, 2000, Atlanta, Georgia, USA (2000)
Jette, Grondona, M.: SLURM: simple Linux utility for resource management. In: Proceedings of ClusterWorld Conference and Expo. San Jose, California, June (2003)
Engine, Sun Grid: Resource management System: Administration manual http://gridengine.sunsource.net
Sherwani, J., Ali, N., Lotia, N., Hayat, Z., Buyya, R.: Libra: a computational economy-based job scheduling system for clusters. J. Concurr. Comput. Pract. Experience 34(6), 573–590 (2004)
Franke, H., et al.: Evaluation of parallel job scheduling for ASCI Blue pacific. In: Supercomputing, 13–18 Nov. 1999, pp. 45–55. ACM/IEEE, New York (1999)
Zahng, Y., Franke, H. Moreira, J.E., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: Parallel and Distributed Processing Symposium, (IPDPS 2000), pp. 133–142 (2000)
Zhang, Y., Sivasubramaniam, A., Moreira, J., Franke, H.: Impact of workload and system parameters on next generation cluster scheduling mechanisms. IEEE Trans. Parallel Distrib. Syst. 12(9), 967–985 (2001)
Frachtenberg, E., Feitelson, D.G., Petrini, F., Fernandez, J.: Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources. In: Parallel and Distributed Processing Symposium, 22–26 April 2003, 10 pp
Yu, J.-L., Kim, J.-S., Maeng, S.-R.: A runtime resolution scheme for priority boost conflict in implicit coscheduling. J. Supercomput. 40(1), 1–28 (2007)
NAS Technical Report NAS -03-010 July (2003)
Subrahmaniam, R.: Implementing coscheduling heuristics for windows NT clusters. Master’s thesis, Dept. of Computer Science and EngPennsylvania State Univ., October (1999)
Lazowska, E.D., Zahorjan, J., Graham, G.S., Sevcik, K.C.: Quantitative System Performance: Computer System Analysis Using Queueing Network Models. Prentice-Hall, New York (1984)
Massie, M.L., Chun, B.N., Culler, D.E.: Ganglia distributed monitoring system: design, implementation, and experience. J. Parallel Comput. (2004)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qureshi, K., Shah, S.M.H. & Manuel, P. Empirical performance evaluation of schedulers for cluster of workstations. Cluster Comput 14, 101–113 (2011). https://doi.org/10.1007/s10586-010-0128-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-010-0128-5