Skip to main content

Scheduling Restartable Jobs with Short Test Runs

  • Conference paper
Book cover Job Scheduling Strategies for Parallel Processing (JSSPP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5798))

Included in the following conference series:

Abstract

In this paper, we examine the concept of giving every job a trial run before committing it to run until completion. Trial runs allow immediate job failures to be detected shortly after job submission and benefit short jobs by letting them run and finish early. This occurs without inflicting a significant penalty on longer jobs, whose average and maximum waiting time are actually improved in some cases. The strategy does not require preemption and instead uses the ability to kill and restart a job from the beginning, which it does at most once for each job. While others have proposed similar strategies, our algorithm is distinguished by its determination to give each job a fixed-length trial run as soon as possible. Our study is also more focused, including a detailed description of the algorithm and an examination of the effect of varying the length of a trial run.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feitelson, D.G., Rudolph, L. (eds.): IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291. Springer, Heidelberg (1997)

    Google Scholar 

  2. Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.): JSSPP 2002. LNCS, vol. 2537. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  3. Chiang, S.-H., Arpaci-Dusseau, A., Vernon, M.K.: The impact of more accurate requested runtimes on production job scheduling performance. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing, [2], pp. 103–127

    Google Scholar 

  4. Chiang, S.-H., Mansharamani, R., Vernon, M.: Use of application characteristics and limited preemption for run-to-completion parallel processor scheduling policies. In: Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pp. 33–44 (1994)

    Google Scholar 

  5. Chiang, S.-H., Vernon, M.K.: Production job scheduling for parallel shared memory systems. In: Proc. 15th IEEE Intern. Parallel and Distributed Processing Symp. (2001)

    Google Scholar 

  6. Downey, A.B.: Using queue time predictions for processor allocation. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing [2], pp. 35–57

    Google Scholar 

  7. Feitelson, D.: The parallel workloads archive, http://www.cs.huji.ac.il/labs/parallel/workload/index.html

  8. Gibbons, R.: A historical application profiler for use by parallel schedulers. In: Proc. 3rd Workshop on Job Scheduling Strategies for Parallel Processing [1]

    Google Scholar 

  9. Kettimuthu, R., Subramani, V., Srinivasan, S., Gopalsamy, T., Panda, D.K., Sadayappan, P.: Selective preemption strategies for parallel job scheduling. Intern. J. of High Performance Computing and Networking 3(2/3), 122–152 (2005)

    Article  Google Scholar 

  10. Lawson, B., Smirni, E., Puiu, D.: Self-adapting backfilling scheduling for parallel systems. In: Proc. 31st Intern. Conf. Parallel Processing, pp. 583–592 (2002)

    Google Scholar 

  11. Lawson, B.G., Smirni, E.: Multiple-queue backfilling scheduling with priorities and reservations for parallel systems. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing [2]

    Google Scholar 

  12. Lee, C.B., Schwartzman, Y., Hardy, J., Snavely, A.: Are user runtime estimates inherently inaccurate? In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 253–263. Springer, Heidelberg (2005)

    Google Scholar 

  13. Lifka, D.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1995 and JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995)

    Google Scholar 

  14. Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel and Distributed Syst. 12(6), 529–543 (2001)

    Article  Google Scholar 

  15. Nissimov, A., Feitelson, D.G.: Probabilistic backfilling. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2007. LNCS, vol. 4942, pp. 102–115. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Perković, D., Keleher, P.J.: Randomization, speculation, and adaptation in batch schedulers. In: Proc. 2000 ACM/IEEE Conf. on Supercomputing (2000)

    Google Scholar 

  17. Schwiegelshohn, U., Yahyapour, R.: Improving first-come-first-serve job scheduling by gang scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1998, SPDP-WS 1998, and JSSPP 1998. LNCS, vol. 1459, pp. 180–198. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Shmueli, E., Feitelson, D.G.: On simulation and design of parallel-systems schedulers: Are we doing the right thing? IEEE Trans. Parallel and Distributed Systems (to appear)

    Google Scholar 

  19. Snell, Q.O., Clement, M.J., Jackson, D.B.: Preemption based backfill. In: Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing [2], pp. 24–37

    Google Scholar 

  20. Srinivasan, S., Kettimuthu, R., Subramani, V., Sadayappan, P.: Characterization of backfilling strategies for parallel job scheduling. In: Proc. Intern. Conf. on Parallel Processing Workshops, pp. 514–522 (2002)

    Google Scholar 

  21. Tsafrir, D., Etsion, Y., Feitelson, D.G.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. on Parallel and Distributed Systems 18(6), 789–803 (2007)

    Article  Google Scholar 

  22. Tsafrir, D., Feitelson, D.G.: The dynamics of backfilling: Solving the mystery of why increased inaccuracy help. In: Proc. IEEE Intern. Symp. on Workload Characterization, pp. 131–141 (2006)

    Google Scholar 

  23. Zotkin, D., Keleher, P.J.: Job-length estimation and performance in backfilling schedulers. In: Proc. 8th IEEE International Symposium on High Performance Distributed Computing, pp. 236–243 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Thebe, O., Bunde, D.P., Leung, V.J. (2009). Scheduling Restartable Jobs with Short Test Runs. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2009. Lecture Notes in Computer Science, vol 5798. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04633-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04633-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04632-2

  • Online ISBN: 978-3-642-04633-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics