ABSTRACT
Loops are a large source of parallelism for many numerical applications. An important issue in the parallel execution of loops is how to schedule them so that the workload is well balanced among the processors. Most existing loop scheduling algorithms were designed for shared-memory multiprocessors, with uniform memory access costs. These approaches are not suitable for distributed-memory multiprocessors where data locality is a major concern and communication costs are high. This paper presents a new scheduling algorithm in which data locality is taken into account. Our approach combines both worlds, static and dynamic scheduling, in a two-level (overlapped) fashion. This way data locality is considered and communication costs are limited. The performance of the new algorithm is evaluated on a CM-5 message-passing distributed-memory multiprocessor.
- BB91.Belkhale, K. P. and Banerjee, P.," A Scheduling Algorithm for Parallelizable Dependent Tasks", in 5th IEEE Int. Parallel Processing Symp., Apr. 1991, pp. 500-506.]]Google ScholarDigital Library
- HP93.Haghighat, M. and Polychronopoulos, C. D., "Symbolic Analysis: A Basis for Parallelization, Optimization and Scheduling of Programs", in 6th Work. on Languages and Compilers ~or Parallel Computing, Aug. 1993.]] Google ScholarDigital Library
- HSF92.Hummel, S. F., Schonberg, E. and Flynn, L. E., "Factoring, a Method for Scheduling Parallel Loops", Communications of the ACM, vol. 35, no. 8, Aug. 1992, pp. 90-101.]] Google ScholarDigital Library
- KW85.Kruskal, C. P. and Weiss, A., "Allocating Independing Subtasks on Parallel Processors", IEEE Trans. on Software Engineering vol. SE-11, no. 10, Oct. 1985, pp. 1001-1016.]] Google ScholarDigital Library
- LS93.Liu, J. and Saletore, V. A., "Self-Scheduling on Distributed-Memory Machines", in IEEE Supercomputing Conf., Nov. 1993, pp. 814-823.]] Google ScholarDigital Library
- LSL92.Liu, J., Saletore, V. A. and Lewis, T. G., "Scheduling Parallel Loops with Variable Length Iteration Execution Times on Parallel Computers", in ISMM Int. Conf. on Parallel and Distributed Computing Systems, Oct. 1992.]]Google Scholar
- LTSS93.Li, H., Tandri, S., Stummu, M. and Sevcik, K., "Locality and Loop Scheduling on NUMA Multiprocessors", in IEEE Int. Conf. on Parallel Processing, Aug. 1993, pp. 140-147.]] Google ScholarDigital Library
- Luc92.Lucco, S., "A Dynamic Scheduling Method for Irregular Parallel Programs", in ACM Int. Conf. on Programming Languages Design and Implementation, Jun. 1992, pp. 200-211.]] Google ScholarDigital Library
- ML94.Markatos, E. P. and LeBlanc, T. J., "Using Processor Affinity in Loop Scheduling on Shared- Memory Multiprocessors", IEEE Trans. on Parallel and Distributed Systems, vol. 5, no. 4, Apr. 1994, pp. 379-400.]] Google ScholarDigital Library
- PK87.Polychronopoulos, C. D. and Kuck, D. J., "Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers", IEEE Trans. on Computers, vol. C-36, no. 12, Dec. 1987, pp. 1425-1439.]] Google ScholarDigital Library
- PKP89.Polychronopoulos, C. D., Kuck, D. J. and Padua, D., "Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems", IEEE Trans. on Computers, vol. C-38, no. 9, Sep. 1989, pp. 1285-1307.]] Google ScholarDigital Library
- Pol88.Polychronopoulos, C. D., Parallel Programming and Compilers. Kluwer Academic Pub., Norwell, MA, 1988.]] Google ScholarDigital Library
- RP89.Rudolph, D. C. and Polychronopoulos, C. D., "An Efficient Messag-Passing Scheduler Based on Guided Self-Scheduling", in ACM Int. Conf. on Supercomputing, Ju. 1989, pp. 50-61.]] Google ScholarDigital Library
- SH86.Sarkar, V. and Hennessy, J., "Compile-Time Partitioning and Scheduling of Parallel Programs", ACM Sigplan Notices, vol. 21, no. 7, July 1986.]] Google ScholarDigital Library
- SLL93.Saletore, V. A., Liu, J. and Lam, B. Y., "Scheduling Non-Uniform Parallel Loops on Distributed Memory Machines", in IEEE Int. Conf. on System Sciences, Jan. 1993, pp. 516-525.]]Google ScholarCross Ref
- TF92.Tawbi, N. and Feautrier, P., "Processor Alloction and Loop Scheduling on Multiprocessor Computers", In ACM Int. Conf. on Supercomputing, Jul. 1992, pp. 63-71.]] Google ScholarDigital Library
- Thi91.Thinking Machines Corporation, Cambridge, MA. The Connection Machine CM-5 Technical Summary, 1991.]]Google Scholar
- TN93.Tzen, T. H. and Ni, L. M., "Trapezoid Self- Scheduling: A Practical Scheduling Scheme for Parallel Compilers. IEEE Trans. on Parallel and Distributed Systems, vol. 4, no. 1, Jan. 1993, pp. 87-98.]] Google ScholarDigital Library
- TY86.Tang, P. and Yew, P.C., "Processor Self- Scheduling for Multiple Nested Parallel Loops", in IEEE Int. Conf. on Parallel Processing, Aug. 1986, pp. 528-535.]]Google Scholar
- Zha91.Zhang, X., "Dynamic and Static Load Scheduling Performance on NUMA Shared Memory Multiprocessors", in A CM Int. Conf. on Supercomputing, Jun. 1991, pp. 128-135.]] Google ScholarDigital Library
Index Terms
- Combining static and dynamic scheduling on distributed-memory multiprocessors
Recommendations
Using Processor Affinity in Loop Scheduling on Shared-Memory Multiprocessors
Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum ...
Parallel Sparse Orthogonal Factorization on Distributed-Memory Multiprocessors
In this paper, we propose a new parallel multifrontal algorithm for the orthogonal factorization of large sparse matrices on distributed-memory multiprocessors. We explore the use of block partitioning schemes in parallel sparse orthogonal factorization. ...
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors
SPDP '96: Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We ...
Comments