Abstract
The question of whether more accurate requested runtimes can significantly improve production parallel system performance has previously been studied for the FCFS-backfill scheduler, using a limited set of system performance measures. This paper examines the question for higher performance backfill policies, heavier system loads as are observed in current leading edge production systems such as the large Origin 2000 system at NCSA, and a broader range of system performance measures. The new results show that more accurate requested runtimes can improve system performance much more significantly than suggested in previous results. For example, average slowdown decreases by a factor of two to six, depending on system load and the fraction of jobs that have the more accurate requests. The new results also show that (a) nearly all of the performance improvement is realized even if the more accurate runtime requests are a factor of two higher than the actual runtimes, (b) most of the performance improvement is achieved when test runs are used to obtain more accurate runtime requests, and (c) in systems where only a fraction (e.g., 60%) of the jobs provide approximately accurate runtime requests, the users that provide the approximately accurate requests achieve even greater improvements in performance, such as an order of magnitude improvement in average slowdown for jobs that have runtime up to fifty hours.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
National Computational Science Alliance Scientific Computing: Silicon Graphics Origin2000. (http://www.ncsa.uiuc.edu/SCD/Hardware/Origin2000) 103, 106
NCSA Scientific Computing: IA-32 Linux Cluster. (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA32LinuxCluster) 103
Lifka, D.: The ANL/IBM SP scheduling system. In: Proc. 1st Workshop on Job Scheduling Strategies for Parallel Processing, Santa Barbara, Lecture Notes in Comp. Sci. Vol. 949, Springer-Verlag (1995) 295–303 103, 109
Skovira, J., Chan, W., Zhou, H., Lifka, K.: The EASY-Loadleveler API Project. In: Proc. 2nd Workshop on Job Scheduling Strategies for Parallel Processing, Honolulu, Lecture Notes in Comp. Sci. Vol. 1162, Springer-Verlag (1996) 41–47 103, 109
Chiang, S. H., Vernon, M. K.: Production job scheduling for parallel shared memory systems. In: Proc. Int’l. Parallel and Distributed Processing Symp. (IPDPS) 2001, San Francisco (2001) 104, 106, 107, 108, 109, 111, 115, 116
Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: Proc. 12th Int’l. Parallel Processing Symp., Orlando (1998) 542–546 104, 109, 110, 113, 116, 117
Mu’alem, A.W., Feitelson, D. G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel and Distributed Syst. 12 (2001) 529–543 104, 108, 109, 110
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: Proc. IEEE 4th Annual Workshop on Workload Characterization, Austin, TX. (2001) 104, 108
Chiang, S. H., Vernon, M. K.: Characteristics of a large shared memory production workload. In: Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, MA. (2001) 104, 106
Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Proc. 5th Workshop on Job Scheduling Strategies for Parallel Processing, San Juan, Lecture Notes in Comp. Sci. Vol. 1659, Springer-Verlag (1999) 202–219 104, 109, 116
Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: Improving parallel job scheduling by combining gang scheduling and backfilling techniques. In: Proc. Int’l. Parallel and Distributed Processing Symp. (IPDPS) 2000, Cancun (2000) 104, 108, 109, 110, 116, 117
Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An analysis of spaceand time-sharing techniques for parallel job scheduling. In: Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, MA. (2001) 104, 109, 110, 116, 117
Zotkin, D., Keleher, P. J.: Job-length estimation and performance in backfilling schedulers. In: 8th IEEE Int’l Symp. on High Performance Distributed Computing, Redondo Beach (1999) 236–243 108, 109, 110, 116, 117
Perkovic, D., Keleher, P. J.: Randomization, speculation, and adaptation in batch schedulers. In: Proc. 2000 ACM/IEEE Supercomputing Conf., Dallas (2000) 108, 109
Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing, Geneva, Lecture Notes in Comp. Sci. Vol. 1291, Springer-Verlag (1997) 109
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chiang, SH., Arpaci-Dusseau, A., Vernon, M.K. (2002). The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2002. Lecture Notes in Computer Science, vol 2537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36180-4_7
Download citation
DOI: https://doi.org/10.1007/3-540-36180-4_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00172-0
Online ISBN: 978-3-540-36180-0
eBook Packages: Springer Book Archive