Toward convergence in job schedulers for parallel supercomputers

Feitelson, Dror G.; Rudolph, Larry

doi:10.1007/BFb0022284

Dror G. Feitelson¹ &
Larry Rudolph¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1162))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

245 Accesses
77 Citations

Abstract

The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, “SP2 system architecture”. IBM Syst. J. 34(2), pp. 152–184, 1995.
Google Scholar
T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy, “Scheduler activations: effective kernel support for the user-level management of parallelism”. ACM Trans. Comput. Syst. 10(1), pp. 53–79, Feb 1992.
Article Google Scholar
J. M. Barton and N. Bitar, “A scalable multi-discipline, multiple-processor scheduling framework for IRIX” In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 45–69, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
P. Brinch Hansen, “An analysis of response ratio scheduling”. In IFIP Congress, Ljubljana, pp. TA-3 150–154, Aug 1971.
Google Scholar
N. Carriero, E. Freedman, D. Gelernter, and D. Kaminsky, “Adaptive parallelism and Piranha”. Computer 28(1), pp. 40–49, Jan 1995.
Article Google Scholar
M-S. Chen and K. G. Shin, “Subcube allocation and task migration in hypercube multiprocessors”. IEEE Trans. Comput. 39(9), pp. 1146–1155, Sep 1990.
Article Google Scholar
S-H. Chiang and M. Vernon, “Dynamic vs. static quantum-based parallel processor allocation”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos, “Multiprogramming on multiprocessors”. In 3rd IEEE Symp. Parallel & Distributed Processing, pp. 590–597, 1991.
Google Scholar
D. Das Sharma and D. K. Pradhan, “A fast and efficient strategy for submesh allocation in mesh-connected parallel computers”. In IEEE Symp. Parallel & Distributed Processing, pp. 682–689, Dec 1993.
Google Scholar
M. Devarakonda and A. Mukherjee, “Issues in implementation of cache-affinity scheduling”. In Proc. Winter USENIX Technical Conf., pp. 345–357, Jan 1992.
Google Scholar
J. Edler, A. Gottlieb, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, M. Snir, P. J. Teller, and J. Wilson, “Issues related to MIMD shared-memory computers: the NYU Ultracomputer approach”. In 12th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 126–135, 1985.
Google Scholar
D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
D. G. Feitelson, A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center, Oct 1994.
Google Scholar
D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
D. G. Feitelson and L. Rudolph, “Distributed hierarchical control for parallel processing”. Computer 23(5), pp. 65–77, May 1990.
Article Google Scholar
D. G. Feitelson and L. Rudolph, “Evaluation of design choices for gang scheduling using distributed hierarchical control”. J. Parallel & Distributed Comput., 1996. to appear.
Google Scholar
D. G. Feitelson and L. Rudolph, “Gang scheduling performance benefits for finegrain synchronization”. J. Parallel & Distributed Comput. 16(4), pp. 306–318, Dec 1992.
Google Scholar
D. G. Feitelson and L. Rudolph, “Parallel job scheduling: issues and approaches”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–18, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
M. Frank, V. Lee, W. Lee, K. Mackenzie, and L. Rudolph, “An online scheduler respecting job cost functions for parallel processors”. Manuscript in preperation, M.I.T. Cambridge, MA, 1996.
Google Scholar
J. Gehring and F. Ramme, “Architecture-independent request-scheduling with tight waiting-time estimations”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM 3 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May 1994.
Google Scholar
B. Gorda and R. Wolski, “Time sharing massively parallel machines”. In Intl. Conf. Parallel Processing, Aug 1995.
Google Scholar
A. Gupta, A. Tucker, and S. Urushibara, “The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 120–132, May 1991.
Google Scholar
R. L. Henderson, “Job scheduling under the portable batch system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 279–294, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
A. Hori et al., “Time space sharing scheduling and architectural support”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 92–105, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
A. Hori, H. Tezuka, Y. Ishikawa, N. Soda, H. Konaka, and M. Maeda, “Implementation of gang-scheduling on workstation cluster”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
N. Islam, A. Prodromidis, and M. Squillante, “Dynamic partitioning in different distributed-memory environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
Y. A. Khalidi, J. Bernabeu, V. Matena, K. Shirriff, and M. Thadani, “Solaris MC: a Multi Computer OS”. In Proc. USENIX Conf., Jan 1996.
Google Scholar
A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C-L. Wang, “Heterogeneous computing: challenges and opportunities”. Computer 26(6), pp. 18–27, Jun 1993.
Article Google Scholar
P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more important than processor allocation for hypercube computers”. IEEE Trans. Parallel & Distributed Syst. 5(5), pp. 488–497, May 1994.
Google Scholar
D. Lifka, “The ANL/IBM SP scheduling system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 295–303, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
W. Liu, V. Lo, K. Windisch, and B. Nitzberg, “Non-contiguous processor allocation algorithms for distributed memory multicomputers”. In Supercomputing '94, pp. 227–236, Nov 1994.
Google Scholar
C. McCann, R. Vaswani, and J. Zahorjan, “A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors”. ACM Trans. Comput. Syst. 11(2), pp. 146–178, May 1993.
Article Google Scholar
C. McCann and J. Zahorjan, “Processor allocation policies for message passing parallel computers”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 19–32, May 1994.
Google Scholar
T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characterization for multiprocessor scheduling policy design”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Using runtime measured workload characteristics in parallel processor scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
J. K. Ousterhout, “Scheduling techniques for concurrent systems”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 22–30, Oct 1982.
Google Scholar
J. D. Padhye and L. W. Dowdy, “Preemptive versus non-preemptive processor allocation policies for message passing parallel computers: an empirical comparison”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
E. W. Parsons and K. C. Sevcik, “Multiprocessor scheduling for high-variability service time distributions”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 127–145, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
J. Peterson and A. Silberschatz, Operating System Concepts. Addison-Wesley, 1983.
Google Scholar
J. Pruyne and M. Livny, “Managing checkpoints for parallel programs”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
J. Pruyne and M. Livny, “Parallel processing on dynamic resources with CARMI”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 259–278, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
E. Rosti, E. Smirni, L. W. Dowdy, G. Serazzi, and B. M. Carlson, “Robust partitioning schemes of multiprocessor systems”. Performance Evaluation 19(2–3), pp. 141–165, Mar 1994.
Article Google Scholar
E. Rosti, E. Smirni, G. Serazzi, and L. W. Dowdy, “Analysis of non-workconserving processor partitioning policies”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 165–181, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
Google Scholar
K. C. Sevcik, “Application scheduling and processor allocation in multiprogrammed parallel processing systems”. Performance Evaluation 19(2–3), pp. 107–140, Mar 1994.
Article Google Scholar
K. C. Sevcik, “Characterization of parallelism in applications and their use in scheduling”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171–180, May 1989.
Google Scholar
J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY — LoadLeveler API project”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
M. S. Squillante and E. D. Lazowska, “Using processor-cache affinity information in shared-memory multiprocessor scheduling”. IEEE Trans. Parallel & Distributed Syst. 4(2), pp. 131–143, Feb 1993.
Google Scholar
S. Thakkar, P. Gifford, and G. Fielland, “Balance: a shared memory multiprocessor system”. In 2nd Intl. Conf. Supercomputing, vol. I, pp. 93–101, 1987.
Google Scholar
J. Torrellas, A. Tucker, and A. Gupta, “Evaluating the performance of cacheaffinity scheduling in shared-memory multiprocessors”. J. Parallel & Distributed Comput. 24(2), pp. 139–151, Feb 1995.
Google Scholar
A. Tucker and A. Gupta, “Process control and scheduling issues for multiprogrammed shared-memory multiprocessors”. In 12th Symp. Operating Systems Principles, pp. 159–166, Dec 1989.
Google Scholar
M. Wan, R. Moore, G. Kremenek, and K. Steube, “A batch scheduler for the Intel Paragon MPP system with a non-contiguous node allocation algorithm”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
F. Wang, M. Papaefthymiou, M. Squillante, L. Rudolph, P. Pattnaik, and H. Franke, “A gang scheduling design for multiprogrammed parallel computing environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Google Scholar
J. Zahorjan and C. McCann, “Processor scheduling in shared memory multiprocessors”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 214–225, May 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, The Hebrew University, 91904, Jerusalem, Israel
Dror G. Feitelson & Larry Rudolph

Authors

Dror G. Feitelson
View author publications
You can also search for this author in PubMed Google Scholar
Larry Rudolph
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feitelson, D.G., Rudolph, L. (1996). Toward convergence in job schedulers for parallel supercomputers. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1996. Lecture Notes in Computer Science, vol 1162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022284

Download citation

DOI: https://doi.org/10.1007/BFb0022284
Published: 15 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61864-5
Online ISBN: 978-3-540-70710-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics