Abstract
The space of job schedulers for parallel supercomputers is rather fragmented, because different researchers tend to make different assumptions about the goals of the scheduler, the information that is available about the workload, and the operations that the scheduler may perform. We argue that by identifying these assumptions explicitly, it is possible to reach a level of convergence. For example, it is possible to unite most of the different assumptions into a common framework by associating a suitable cost function with the execution of each job. The cost function reflects knowledge about the job and the degree to which it fits the goals of the system. Given such cost functions, scheduling is done to maximize the system's profit.
Preview
Unable to display preview. Download preview PDF.
References
T. Agerwala, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, “SP2 system architecture”. IBM Syst. J. 34(2), pp. 152–184, 1995.
T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy, “Scheduler activations: effective kernel support for the user-level management of parallelism”. ACM Trans. Comput. Syst. 10(1), pp. 53–79, Feb 1992.
J. M. Barton and N. Bitar, “A scalable multi-discipline, multiple-processor scheduling framework for IRIX” In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 45–69, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
P. Brinch Hansen, “An analysis of response ratio scheduling”. In IFIP Congress, Ljubljana, pp. TA-3 150–154, Aug 1971.
N. Carriero, E. Freedman, D. Gelernter, and D. Kaminsky, “Adaptive parallelism and Piranha”. Computer 28(1), pp. 40–49, Jan 1995.
M-S. Chen and K. G. Shin, “Subcube allocation and task migration in hypercube multiprocessors”. IEEE Trans. Comput. 39(9), pp. 1146–1155, Sep 1990.
S-H. Chiang and M. Vernon, “Dynamic vs. static quantum-based parallel processor allocation”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
M. Crovella, P. Das, C. Dubnicki, T. LeBlanc, and E. Markatos, “Multiprogramming on multiprocessors”. In 3rd IEEE Symp. Parallel & Distributed Processing, pp. 590–597, 1991.
D. Das Sharma and D. K. Pradhan, “A fast and efficient strategy for submesh allocation in mesh-connected parallel computers”. In IEEE Symp. Parallel & Distributed Processing, pp. 682–689, Dec 1993.
M. Devarakonda and A. Mukherjee, “Issues in implementation of cache-affinity scheduling”. In Proc. Winter USENIX Technical Conf., pp. 345–357, Jan 1992.
J. Edler, A. Gottlieb, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, M. Snir, P. J. Teller, and J. Wilson, “Issues related to MIMD shared-memory computers: the NYU Ultracomputer approach”. In 12th Ann. Intl. Symp. Computer Architecture Conf. Proc., pp. 126–135, 1985.
D. G. Feitelson, “Packing schemes for gang scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
D. G. Feitelson, A Survey of Scheduling in Multiprogrammed Parallel Systems. Research Report RC 19790 (87657), IBM T. J. Watson Research Center, Oct 1994.
D. G. Feitelson and B. Nitzberg, “Job characteristics of a production parallel scientific workload on the NASA Ames iPSC/860”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 337–360, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
D. G. Feitelson and L. Rudolph, “Distributed hierarchical control for parallel processing”. Computer 23(5), pp. 65–77, May 1990.
D. G. Feitelson and L. Rudolph, “Evaluation of design choices for gang scheduling using distributed hierarchical control”. J. Parallel & Distributed Comput., 1996. to appear.
D. G. Feitelson and L. Rudolph, “Gang scheduling performance benefits for finegrain synchronization”. J. Parallel & Distributed Comput. 16(4), pp. 306–318, Dec 1992.
D. G. Feitelson and L. Rudolph, “Parallel job scheduling: issues and approaches”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 1–18, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
M. Frank, V. Lee, W. Lee, K. Mackenzie, and L. Rudolph, “An online scheduler respecting job cost functions for parallel processors”. Manuscript in preperation, M.I.T. Cambridge, MA, 1996.
J. Gehring and F. Ramme, “Architecture-independent request-scheduling with tight waiting-time estimations”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam, PVM 3 User's Guide and Reference Manual. Technical Report ORNL/TM-12187, Oak Ridge National Laboratory, May 1994.
B. Gorda and R. Wolski, “Time sharing massively parallel machines”. In Intl. Conf. Parallel Processing, Aug 1995.
A. Gupta, A. Tucker, and S. Urushibara, “The impact of operating system scheduling policies and synchronization methods on the performance of parallel applications”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 120–132, May 1991.
R. L. Henderson, “Job scheduling under the portable batch system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 279–294, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
A. Hori et al., “Time space sharing scheduling and architectural support”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 92–105, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
A. Hori, H. Tezuka, Y. Ishikawa, N. Soda, H. Konaka, and M. Maeda, “Implementation of gang-scheduling on workstation cluster”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
S. Hotovy, “Workload evolution on the Cornell Theory Center IBM SP2”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
N. Islam, A. Prodromidis, and M. Squillante, “Dynamic partitioning in different distributed-memory environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
Y. A. Khalidi, J. Bernabeu, V. Matena, K. Shirriff, and M. Thadani, “Solaris MC: a Multi Computer OS”. In Proc. USENIX Conf., Jan 1996.
A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C-L. Wang, “Heterogeneous computing: challenges and opportunities”. Computer 26(6), pp. 18–27, Jun 1993.
P. Krueger, T-H. Lai, and V. A. Dixit-Radiya, “Job scheduling is more important than processor allocation for hypercube computers”. IEEE Trans. Parallel & Distributed Syst. 5(5), pp. 488–497, May 1994.
D. Lifka, “The ANL/IBM SP scheduling system”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 295–303, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
W. Liu, V. Lo, K. Windisch, and B. Nitzberg, “Non-contiguous processor allocation algorithms for distributed memory multicomputers”. In Supercomputing '94, pp. 227–236, Nov 1994.
C. McCann, R. Vaswani, and J. Zahorjan, “A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors”. ACM Trans. Comput. Syst. 11(2), pp. 146–178, May 1993.
C. McCann and J. Zahorjan, “Processor allocation policies for message passing parallel computers”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 19–32, May 1994.
T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Parallel application characterization for multiprocessor scheduling policy design”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
T. D. Nguyen, R. Vaswani, and J. Zahorjan, “Using runtime measured workload characteristics in parallel processor scheduling”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
J. K. Ousterhout, “Scheduling techniques for concurrent systems”. In 3rd Intl. Conf. Distributed Comput. Syst., pp. 22–30, Oct 1982.
J. D. Padhye and L. W. Dowdy, “Preemptive versus non-preemptive processor allocation policies for message passing parallel computers: an empirical comparison”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
E. W. Parsons and K. C. Sevcik, “Multiprocessor scheduling for high-variability service time distributions”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 127–145, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
J. Peterson and A. Silberschatz, Operating System Concepts. Addison-Wesley, 1983.
J. Pruyne and M. Livny, “Managing checkpoints for parallel programs”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
J. Pruyne and M. Livny, “Parallel processing on dynamic resources with CARMI”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 259–278, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
E. Rosti, E. Smirni, L. W. Dowdy, G. Serazzi, and B. M. Carlson, “Robust partitioning schemes of multiprocessor systems”. Performance Evaluation 19(2–3), pp. 141–165, Mar 1994.
E. Rosti, E. Smirni, G. Serazzi, and L. W. Dowdy, “Analysis of non-workconserving processor partitioning policies”. In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (eds.), pp. 165–181, Springer-Verlag, 1995. Lecture Notes in Computer Science Vol. 949.
K. C. Sevcik, “Application scheduling and processor allocation in multiprogrammed parallel processing systems”. Performance Evaluation 19(2–3), pp. 107–140, Mar 1994.
K. C. Sevcik, “Characterization of parallelism in applications and their use in scheduling”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 171–180, May 1989.
J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY — LoadLeveler API project”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
M. S. Squillante and E. D. Lazowska, “Using processor-cache affinity information in shared-memory multiprocessor scheduling”. IEEE Trans. Parallel & Distributed Syst. 4(2), pp. 131–143, Feb 1993.
S. Thakkar, P. Gifford, and G. Fielland, “Balance: a shared memory multiprocessor system”. In 2nd Intl. Conf. Supercomputing, vol. I, pp. 93–101, 1987.
J. Torrellas, A. Tucker, and A. Gupta, “Evaluating the performance of cacheaffinity scheduling in shared-memory multiprocessors”. J. Parallel & Distributed Comput. 24(2), pp. 139–151, Feb 1995.
A. Tucker and A. Gupta, “Process control and scheduling issues for multiprogrammed shared-memory multiprocessors”. In 12th Symp. Operating Systems Principles, pp. 159–166, Dec 1989.
M. Wan, R. Moore, G. Kremenek, and K. Steube, “A batch scheduler for the Intel Paragon MPP system with a non-contiguous node allocation algorithm”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
F. Wang, M. Papaefthymiou, M. Squillante, L. Rudolph, P. Pattnaik, and H. Franke, “A gang scheduling design for multiprogrammed parallel computing environments”. In Job Scheduling Strategies for Parallel Processing II, D. G. Feitelson and L. Rudolph (eds.), Springer-Verlag, 1996. Lecture Notes in Computer Science.
J. Zahorjan and C. McCann, “Processor scheduling in shared memory multiprocessors”. In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 214–225, May 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feitelson, D.G., Rudolph, L. (1996). Toward convergence in job schedulers for parallel supercomputers. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1996. Lecture Notes in Computer Science, vol 1162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0022284
Download citation
DOI: https://doi.org/10.1007/BFb0022284
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61864-5
Online ISBN: 978-3-540-70710-3
eBook Packages: Springer Book Archive