Skip to main content

Toward convergence in job schedulers for parallel supercomputers

  • Conference paper
  • First Online:
Job Scheduling Strategies for Parallel Processing (JSSPP 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1162))

Included in the following conference series:

Abstract

The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, “SP2 system architecture”. IBM Syst. J. 34(2), pp. 152–184, 1995.

    Google Scholar 

  2. T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy, “Scheduler activations: effective kernel support for the user-level management of parallelism”. ACM Trans. Comput. Syst. 10(1), pp. 53–79, Feb 1992.

    Article  Google Scholar 

  3. J. M. Barton and N. Bitar, “A scalable multi-discipline, multiple-processor scheduling framework for IRIX” In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 45–69, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  4. P. Brinch Hansen, “An analysis of response ratio scheduling”. In IFIP Congress, Ljubljana, pp. TA-3 150–154, Aug 1971.

    Google Scholar 

  5. N. Carriero, E. Freedman, D. Gelernter, and D. Kaminsky, “Adaptive parallelism and Piranha”. Computer 28(1), pp. 40–49, Jan 1995.

    Article  Google Scholar 

  6. M-S. Chen and K. G. Shin, “Subcube allocation and task migration in hypercube multiprocessors”. IEEE Trans. Comput. 39(9), pp. 1146–1155, Sep 1990.

    Article  Google Scholar 

  7. S-H. Chiang and M. Vernon, “Dynamic vs. static quantum-based parallel processor allocation”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  8. M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos, “Multiprogramming on multiprocessors”. In 3rd IEEE Symp. Parallel & Distributed Processing, pp. 590–597, 1991.

    Google Scholar 

  9. D. Das Sharma and D. K. Pradhan, “A fast and efficient strategy for submesh allocation in mesh-connected parallel computers”. In IEEE Symp. Parallel & Distributed Processing, pp. 682–689, Dec 1993.

    Google Scholar 

  10. M. Devarakonda and A. Mukherjee, “Issues in implementation of cache-affinity scheduling”. In Proc. Winter USENIX Technical Conf., pp. 345–357, Jan 1992.

    Google Scholar 

  11. J. Edler, A. Gottlieb, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, M. Snir, P. J. Teller, and J. Wilson, “Issues related to MIMD shared-memory computers: the NYU Ultracomputer approach”. In 12th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 126–135, 1985.

    Google Scholar 

  12. D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  13. D. G. Feitelson, A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center, Oct 1994.

    Google Scholar 

  14. D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  15. D. G. Feitelson and L. Rudolph, “Distributed hierarchical control for parallel processing”. Computer 23(5), pp. 65–77, May 1990.

    Article  Google Scholar 

  16. D. G. Feitelson and L. Rudolph, “Evaluation of design choices for gang scheduling using distributed hierarchical control”. J. Parallel & Distributed Comput., 1996. to appear.

    Google Scholar 

  17. D. G. Feitelson and L. Rudolph, “Gang scheduling performance benefits for finegrain synchronization”. J. Parallel & Distributed Comput. 16(4), pp. 306–318, Dec 1992.

    Google Scholar 

  18. D. G. Feitelson and L. Rudolph, “Parallel job scheduling: issues and approaches”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–18, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  19. M. Frank, V. Lee, W. Lee, K. Mackenzie, and L. Rudolph, “An online scheduler respecting job cost functions for parallel processors”. Manuscript in preperation, M.I.T. Cambridge, MA, 1996.

    Google Scholar 

  20. J. Gehring and F. Ramme, “Architecture-independent request-scheduling with tight waiting-time estimations”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  21. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM 3 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May 1994.

    Google Scholar 

  22. B. Gorda and R. Wolski, “Time sharing massively parallel machines”. In Intl. Conf. Parallel Processing, Aug 1995.

    Google Scholar 

  23. A. Gupta, A. Tucker, and S. Urushibara, “The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 120–132, May 1991.

    Google Scholar 

  24. R. L. Henderson, “Job scheduling under the portable batch system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 279–294, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  25. A. Hori et al., “Time space sharing scheduling and architectural support”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 92–105, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  26. A. Hori, H. Tezuka, Y. Ishikawa, N. Soda, H. Konaka, and M. Maeda, “Implementation of gang-scheduling on workstation cluster”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  27. S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  28. N. Islam, A. Prodromidis, and M. Squillante, “Dynamic partitioning in different distributed-memory environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  29. Y. A. Khalidi, J. Bernabeu, V. Matena, K. Shirriff, and M. Thadani, “Solaris MC: a Multi Computer OS”. In Proc. USENIX Conf., Jan 1996.

    Google Scholar 

  30. A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C-L. Wang, “Heterogeneous computing: challenges and opportunities”. Computer 26(6), pp. 18–27, Jun 1993.

    Article  Google Scholar 

  31. P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more important than processor allocation for hypercube computers”. IEEE Trans. Parallel & Distributed Syst. 5(5), pp. 488–497, May 1994.

    Google Scholar 

  32. D. Lifka, “The ANL/IBM SP scheduling system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 295–303, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  33. W. Liu, V. Lo, K. Windisch, and B. Nitzberg, “Non-contiguous processor allocation algorithms for distributed memory multicomputers”. In Supercomputing '94, pp. 227–236, Nov 1994.

    Google Scholar 

  34. C. McCann, R. Vaswani, and J. Zahorjan, “A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors”. ACM Trans. Comput. Syst. 11(2), pp. 146–178, May 1993.

    Article  Google Scholar 

  35. C. McCann and J. Zahorjan, “Processor allocation policies for message passing parallel computers”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 19–32, May 1994.

    Google Scholar 

  36. T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characterization for multiprocessor scheduling policy design”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  37. T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Using runtime measured workload characteristics in parallel processor scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  38. J. K. Ousterhout, “Scheduling techniques for concurrent systems”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 22–30, Oct 1982.

    Google Scholar 

  39. J. D. Padhye and L. W. Dowdy, “Preemptive versus non-preemptive processor allocation policies for message passing parallel computers: an empirical comparison”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  40. E. W. Parsons and K. C. Sevcik, “Multiprocessor scheduling for high-variability service time distributions”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 127–145, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  41. J. Peterson and A. Silberschatz, Operating System Concepts. Addison-Wesley, 1983.

    Google Scholar 

  42. J. Pruyne and M. Livny, “Managing checkpoints for parallel programs”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  43. J. Pruyne and M. Livny, “Parallel processing on dynamic resources with CARMI”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 259–278, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  44. E. Rosti, E. Smirni, L. W. Dowdy, G. Serazzi, and B. M. Carlson, “Robust partitioning schemes of multiprocessor systems”. Performance Evaluation 19(2–3), pp. 141–165, Mar 1994.

    Article  Google Scholar 

  45. E. Rosti, E. Smirni, G. Serazzi, and L. W. Dowdy, “Analysis of non-workconserving processor partitioning policies”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 165–181, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.

    Google Scholar 

  46. K. C. Sevcik, “Application scheduling and processor allocation in multiprogrammed parallel processing systems”. Performance Evaluation 19(2–3), pp. 107–140, Mar 1994.

    Article  Google Scholar 

  47. K. C. Sevcik, “Characterization of parallelism in applications and their use in scheduling”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171–180, May 1989.

    Google Scholar 

  48. J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY — LoadLeveler API project”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  49. M. S. Squillante and E. D. Lazowska, “Using processor-cache affinity information in shared-memory multiprocessor scheduling”. IEEE Trans. Parallel & Distributed Syst. 4(2), pp. 131–143, Feb 1993.

    Google Scholar 

  50. S. Thakkar, P. Gifford, and G. Fielland, “Balance: a shared memory multiprocessor system”. In 2nd Intl. Conf. Supercomputing, vol. I, pp. 93–101, 1987.

    Google Scholar 

  51. J. Torrellas, A. Tucker, and A. Gupta, “Evaluating the performance of cacheaffinity scheduling in shared-memory multiprocessors”. J. Parallel & Distributed Comput. 24(2), pp. 139–151, Feb 1995.

    Google Scholar 

  52. A. Tucker and A. Gupta, “Process control and scheduling issues for multiprogrammed shared-memory multiprocessors”. In 12th Symp. Operating Systems Principles, pp. 159–166, Dec 1989.

    Google Scholar 

  53. M. Wan, R. Moore, G. Kremenek, and K. Steube, “A batch scheduler for the Intel Paragon MPP system with a non-contiguous node allocation algorithm”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  54. F. Wang, M. Papaefthymiou, M. Squillante, L. Rudolph, P. Pattnaik, and H. Franke, “A gang scheduling design for multiprogrammed parallel computing environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.

    Google Scholar 

  55. J. Zahorjan and C. McCann, “Processor scheduling in shared memory multiprocessors”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 214–225, May 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feitelson, D.G., Rudolph, L. (1996). Toward convergence in job schedulers for parallel supercomputers. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1996. Lecture Notes in Computer Science, vol 1162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022284

Download citation

  • DOI: https://doi.org/10.1007/BFb0022284

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61864-5

  • Online ISBN: 978-3-540-70710-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics