Abstract
User estimates of job runtimes have emerged as an important component of the workload on parallel machines, and can have a significant impact on how a scheduler treats different jobs, and thus on overall performance. It is therefore highly desirable to have a good model of the relationship between parallel jobs and their associated estimates. We construct such a model based on a detailed analysis of several workload traces. The model incorporates those features that are consistent in all of the logs, most notably the inherently modal nature of estimates (e.g. only 20 different values are used as estimates for about 90% of the jobs). We find that the behavior of users, as manifested through the estimate distributions, is remarkably similar across the different workload traces. Indeed, providing our model with only the maximal allowed estimate value, along with the percentage of jobs that have used it, yields results that are very similar to the original. The remaining difference (if any) is largely eliminated by providing information on one or two additional popular estimates. Consequently, in comparison to previous models, simulations that utilize our model are better in reproducing scheduling behavior similar to that observed when using real estimates.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2002. LNCS, vol. 2537, pp. 103–127. Springer, Heidelberg (2002)
Chiang, S.-H., Vernon, M.K.: Characteristics of a large shared memory production workload. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 159–187. Springer, Heidelberg (2001)
Cirne, W., Berman, F.: A comprehensive model of the supercomputer workload. In: 4th Workshop on Workload Characterization (December 2001)
Cirne, W., Berman, F.: A model for moldable supercomputer jobs. In: 15th Intl. Parallel & Distributed Processing Symp. (April 2001)
Crovella, M.E.: Performance evaluation with heavy tailed distributions. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 1–10. Springer, Heidelberg (2001)
Downey, A.B.: A parallel workload model and its implications for processor allocation. In: 6th Intl. Symp. High Performance Distributed Comput., August 1997, pp. 112–124 (1997)
Etsion, Y., Tsafrir, D.: A Short Survey of Commercial Cluster Batch Schedulers. Technical Report 2005-13, Hebrew University (May 2005)
Feitelson, D.G.: Experimental analysis of the root causes of performance evaluation results: a backfilling case study. IEEE Trans. Parallel & Distributed Syst. 16(2), 175–182 (2005)
Feitelson, D.G.: Parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload
Feitelson, D.G., Jette, M.A.: Improved utilization and responsiveness with gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 238–261. Springer, Heidelberg (1997)
Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th Intl. Parallel Processing Symp., April 1998, pp. 542–546 (1998)
Feitelson, D.G., Nitzberg, B.: Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 337–360. Springer, Heidelberg (1995)
Frachtenberg, E., Feitelson, D.G., Fernandez, J., Petrini, F.: Parallel job scheduling under dynamic workloads. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 208–227. Springer, Heidelberg (2003)
Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 58–77. Springer, Heidelberg (1997)
Jann, J., Pattnaik, P., Franke, H., Wang, F., Skovira, J., Riodan, J.: Modeling of workload in MPPs. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 95–116. Springer, Heidelberg (1997)
Keleher, P.J., Zotkin, D., Perkovic, D.: Attacking the bottlenecks of backfilling schedulers. Cluster Comput. 3(4), 255–263 (2000)
Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)
Li, H., Groep, D., Wolters, J.T.L.: Predicting job start times on clusters. In: International Symposium on Cluster Computing and the Grid, CCGrid (2004)
Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)
Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel & Distributed Comput. 63(11), 1105–1122 (2003)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel & Distributed Syst. 12(6), 529–543 (2001)
Perkovic, D., Keleher, P.J.: Randomization, speculation, and adaptation in batch schedulers. In: Supercomputing, September 2000, p. 7 (2000)
Smith, W., Foster, I., Taylor, V.: Predicting application run times using historical information. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 122–142. Springer, Heidelberg (1998)
Talby, D.: User Modeling of Parallel Workloads. PhD thesis, The Hebrew University of Jerusalem, Israel (2000) (in preparation)
Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling Using Runtime Predictions Rather Than User Estimates. Technical Report 2005-5, Hebrew University (February 2005)
Tsafrir, D., Feitelson, D.G.: Workload Flurries. Technical Report 2003-85, Hebrew University (November 2003)
Zhang, Y., Franke, H., Moreira, J.E., Sivasubramaniam, A.: An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 133–158. Springer, Heidelberg (2001)
Zilber, J., Amit, O., Talby, D.: What is worth learning from parallel workloads? A user and session based analysis. In: Intl. Conf. Supercomputing (June 2005)
Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: 8th Intl. Symp. High Performance Distributed Comput. (August 1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tsafrir, D., Etsion, Y., Feitelson, D.G. (2005). Modeling User Runtime Estimates. In: Feitelson, D., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2005. Lecture Notes in Computer Science, vol 3834. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11605300_1
Download citation
DOI: https://doi.org/10.1007/11605300_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31024-2
Online ISBN: 978-3-540-31617-6
eBook Packages: Computer ScienceComputer Science (R0)