ABSTRACT
An efficient job scheduling must ensure high throughput and good performance. Moreover in highly parallel systems where processors are a critical resource, high machine utilization becomes an essential aspect.Backfilling consists on moving jobs ahead in the queue, given that they do not delay certain previously submitted jobs. When the execution time of a backfilled job was underestimated, some action has to be taken with it: abort, suspend/resume, checkpoint/restart, remain executing.In this paper we propose an alternative choice for that situation which consists on apply Virtual Malleability to the backfilled job. This means that its processors partition will be reduced, and as MPI jobs aren't really malleable, we make the job contend with itself for the use of processors by applying Co-scheduling. In this way resources are freed and the job at the head of the queue have a chance to start executing. In addition to this, as MPI parallel jobs can be Moldable, we add this possibility to the scheme.We obtained better performance than traditional backfilling in about 25 %, especially in high machine utilization. We claim also for the portability of our technique which does not requires special support from the operating system as checkpointing does.
- W. Cirne. Using Moldability to Improve the Performance of Supercomputer Jobs. Ph.D Thesis. Computer Science and Eng. University of California San Diego, 2001.]] Google ScholarDigital Library
- D. Bailey, T. Harris, W. Saphir, R. Wijngaart, A. Woo and M. Yarrow, "The NAS Parallel Benchmarks 2.0", Technical Report NAS-95-020, NASA, December 1995.]]Google Scholar
- M. V. Devarakonda, R. Iyer. Predictability of Process Resource Usage: A Measurement Based Study on UNIX. IEEE Trans. Soft. Eng. 15(12), pp. 1579--1586, Dec. 1989.]] Google ScholarDigital Library
- A. Downey. A Model for Speedup of Parallel Programs. Technical Report CSD-97-933. University of California at Berkerley, 1997.]] Google ScholarDigital Library
- D. G. Feitelson. Logs of real parallel workloads from production systems. http://www.cs.hujiac.il/labs/parallel/workload/logs.html.]]Google Scholar
- D. G. Feitelson and M. A. Jette. Improved Utilization and Responsiveness with Gang Scheduling. Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science. Springer-Verlag 1997.]] Google ScholarDigital Library
- D. G. Feitelson, B. Nitzberg. Jobs Characteristis of a Production Parallel Scientific Workload on the NASA Ames Ipsc/860, in JSSPP Springer-Verlag, Lectures Notes in Computer Science, vol. 949, pp. 337--360, 1995.]] Google ScholarDigital Library
- D. G. Feitelson, L. Rudolph, U. Schiwiegelshohn, K. Sevcik and P. Wong. Theory and Practice in Parallel Job Scheduling. Lecture Notes in Computer Science, 1291:1--34, 1997.]] Google ScholarDigital Library
- D. Jackson, Q. Snell and M. Clement. Core Algorithms of the Maui Scheduler. In Worshop on Job Sched Strategies for Parallel Processing, pp. 87--102, 2001.]] Google ScholarDigital Library
- A. M. Weil and D. Feitelson. Utilization, Predictabiligy, Workloads and User Runtimes Estimates in Scheduling the IBM SP2 with Backfilling, In IEEE Trans. on Parallel and Distributed Syst. 12(6), pp. 529--543, Jun. 2001.]] Google ScholarDigital Library
- E. Frachtenberg, D. Feitelson, J. Fernández, F. Petrini. Parallel Job Scheduling Under Dynamic Workloads. JSSPP 2003.]]Google Scholar
- E. Frachtenberg, D. G. Feitelson, F. Petrini, and J. Fernandez, "Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources", In 17th Intl. Parallel & Distributed Processing Symp., Apr 2003.]] Google ScholarDigital Library
- A. Gupta, A. Tucker, and S. Urushibara. The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Jobs. In Proceedings of the 1991 ACM SIGMETRICS Conference, pp 120--132, May 1991.]] Google ScholarDigital Library
- B. Lawson and E. Smirni. Multiple-queue Backfilling Scheduling with Priorities and Reservations for Parallel Systems. In Job Sched, Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), Springer Verlag, Lect. Notes Comp. Sc. Vol. 2537, pp. 72--87, 2002.]] Google ScholarDigital Library
- D. Lifka. The ANL/IBM SP scheduling system. In Job Scheduling Strategies for Parallel Processing, pp. 295--303, Springer Verlag, 1995 (LNCS 949).]] Google ScholarDigital Library
- X. Martorell, J. Corbalán, Dimitrios S. Nikolopoulos, Nacho Navarro, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou, Jesús Labarta: A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER. JSSPP 2000: 87--112.]] Google ScholarDigital Library
- Message Passing Interface Forum. MPI: A Message-Passing Interface standard. Int. Journal of SuperComputer Jobs, 8(3/4):165--414, 1994.]]Google Scholar
- Q. Snell, Mark J. Clement, David B. Jackson: Preemption Based Backfill. JSSPP 2002: 24--37.]] Google ScholarDigital Library
- V. Sarkar. Determining Average Program Execution Times and Their Variance. In Proc. SIGPLAN Conf, Prog. Lang. Dessign and Implementation, pp. 298--312, Jun 1989.]] Google ScholarDigital Library
- Albert Serra, Nacho Navarro, and Toni Cortes. DITools: Application-level support for dynamic extension and flexible composition. In Proc. USENIX Annual Technical Conf., pp 225--238, 2000.]] Google ScholarDigital Library
- E. Shmueli, D. Feitelson, Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling. Springer Verlag 2003. Lectures Notes Comp. Science.]]Google Scholar
- Silicon Graphics, Inc. IRIX Admin: Resource Administration, Document number 007-3700-005, http://techpubs.sgi.com, 2000.]]Google Scholar
- S. Srinivasan, R. Kettimuthu, V. Subramani, P. Sadayappan. Characterization of Backfilling strategies for Parallel Job Scheduling. In Proc. of 2002 Intl. Workshops on Parallel Proc, Aug, 2002.]] Google ScholarDigital Library
- S. Srinivasan, V. Subramani, R. Kettimuthu, P. Holenarsipur, and P. Sadayappan. Effective Selection of Partition Sizes for Moldable Scheduling of Parallel Jobs. In Proceedings of the 9th Intl. Conference on High Performance Computing, Dec. 2002.]] Google ScholarDigital Library
- Sweep3D Bench http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/asci_sweep3d.html]]Google Scholar
- D. Talb, D. Feitelson. Supporting Priorities and Improving Utilization of the IBM SP Scheduler Using Slack-Based Backfilling. In 13th Intl. Parallel Proc. Symp. (IPPS), pp. 513--517, Apr. 1999.]] Google ScholarDigital Library
- G. Utrera, J. Corbalán, J. Labarta. Implementing Malleability on MPI Jobs. In Proceedings of the Parallel Architecture and Compilation Techniques, 13th International Conference on (PACT'04), pp. 215--224, Antibes Juan-les-Pins, France, Sep 29 - Oct 03, 2004.]] Google ScholarDigital Library
- G. Utrera, J. Corbalán, J. Labarta. Scheduling of MPI applications: Self Co-Scheduling.Euro-Par 2004, Lecture Notes in Computer Science 3149, pp 238--245.]]Google Scholar
- W. Ward Jr., C. L. Mahood, J. E. West. Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy. JSSPP 2002.]] Google ScholarDigital Library
- Y. Zhang, H. Franke, J. Moreira, A. Sivasubramaniam. Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques. IPDPS 2000.]]Google ScholarCross Ref
- C. McCann and J. Zahorjan, "Processor allocation policies for message passing parallel computers". In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 19--32, May 1994.]] Google ScholarDigital Library
Recommendations
Scheduling jobs under decreasing linear deterioration
This paper considers the scheduling problems under decreasing linear deterioration. Deterioration of a job means that its processing time is a function of its execution start time. Optimal algorithms are presented respectively for single machine ...
Scheduling of deteriorating jobs with release dates to minimize the maximum lateness
In this paper, we consider the problem of scheduling n deteriorating jobs with release dates on a single (batching) machine. Each job's processing time is a simple linear function of its starting time. The objective is to minimize the maximum lateness. ...
Single machine parallel-batch scheduling with deteriorating jobs
We consider several single machine parallel-batch scheduling problems in which the processing time of a job is a linear function of its starting time. We give a polynomial-time algorithm for minimizing the maximum cost, an O(n5) time algorithm for ...
Comments