Skip to main content
Log in

Processor preallocation and load balancing of DOALL loops

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Load balance is important because it may affect the speedup attained through the concurrent execution of loop iterations on a parallel processor. We study loop load balance in the context of the well-known Perfect benchmarks. Several static and dynamic characteristics of the Perfect benchmark DOALL loops are observed and interpreted. Thelate arrival of processors is noted as a major source of load imbalance. This observation suggested the idea ofprocessor preallocation. An analytic cost model is presented and the advantages of processor preallocation are demonstrated by experimental evaluation on a CRAY Y-MP8 under the Unicos operating system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banerjee, U. 1993.Loop Transformations for Restructuring Compilers. Kluwer Academic, Boston.

    Google Scholar 

  • Berry, M., et al. 1989. The Perfect Club benchmarks: Effective performance evaluation of supercomputers.Internat. J. Supercomputer Applications (fall): 5–40.

  • Chen, D.K., and Yew, P.C. 1991. An empirical study of DOACROSS loops. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 620–632.

  • Chen, D.K., Su, H.M., and Yew, P.C. 1990. The impact of synchronization and granularity on parallel systems. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 239–248.

  • Cheng, D.Y., and Pase, D.M. 1991. An evaluation of automatic and interactive parallel programming tools. InProc., Supercomputing '91 Conf. (Albuquerque, New Mex., Nov. 18–22), pp. 412–423.

  • Cray Research. 1989.Cray Y-MP Hardware Reference Manual (HR-4001). Cray Research, Inc., Mendota Heights, Minn.

    Google Scholar 

  • Cray Research. 1991a.CF77 Compiling System, Volume 4: Parallel Processing Guide (SG-3074 5.0). Cray Research, Inc., Mendota Heights, Minn.

    Google Scholar 

  • Cray Research. 1991b.Cray Y-MP, Volume 4: UNICOS System Calls Reference Manual (SR-2012 6.0). Cray Research, Inc., Mendota Heights, Minn.

    Google Scholar 

  • Cytron, R., Lipkis, J., and Schonberg, E. 1990. A compiler-assisted approach to SPMD execution. InProc., Supercomputing '90 Conf. (New York, Nov. 12–16), pp. 398–406.

  • Eigenmann, R., and Blume, W. 1991. An effectiveness study of parallelizing compiler techniques. InProc., Internat. Conf. on Parallel Processing (Austin, Tex., Aug. 12–17), pp. II-17–25.

  • Elsesser, G., Ngo, V., Bhattacharya, S., and Tsai, W.-T. 1993. A study of DOALL loops in Perfect Club benchmark. InProc., Internat. Parallel Processing Symp. (Newport Beach, Calif. Apr. 13–16), pp. 129–133.

  • Ferrante, J., Ottenstein, K.J., and Warren, J.D. 1987. The program deppendence graph and its use in optimization. ACMTrans. on Programming Languages and Systems, 9: 319–349.

    Google Scholar 

  • Fu, J.W.C., and Patel, J.H. 1991. Data prefetching in multiprocessor vector cache memories. InProc., Internat. Symp. on Computer Architecture (Toronto, May 27–30), pp. 54–63.

  • Hummel, S.F., Schonberg, E., and Flynn, L.E. 1992. Factoring: A method for scheduling parallel loops.CACM, 35, 8 (Aug.): 90–101.

    Google Scholar 

  • Knuth, D.E. 1971. An empirical study of Fortran programs.J. Software Practice and Experience, 1, 12 (Dec.): 105–134.

    Google Scholar 

  • Kruskal, C.P., and Weiss, A. 1985. Allocating independent subtasks on parallel processors.IEEE Trans. Software Engineering, 11, 10 (Oct.): 1001–1016.

    Google Scholar 

  • Kuck, D.J., Budnick, P., Chen, S., Davis, E., Jr., Han, J., Kraska, P., Lawrie, D., Muraoka, Y., Strebendt, R., and Towle, R. 1974. Measurements of parallelism in ordinary Fortran programs.IEEE Comp., 7, 1 (Jan.): 37–46.

    Google Scholar 

  • Kumar, M. 1988. Measuring parallelism in computation-intensive scientific engineering applications.IEEE Comp., 37, 9 (Sept.): 1088–1098.

    Google Scholar 

  • Polychronopoulos, C.D. 1988.Parallel Programming and Compilers. Kluwer Academic, Boston.

    Google Scholar 

  • Reddy, A.L.N., and Banerjee, P. 1990. A study of I/O behavior of Perfect benchmarks on a multiprocessor. InProc., Internat. Symp. on Comp. Architecture (Seattle, May 28–31), pp. 312–317.

  • Sarkar, V. 1989. Determining average program execution times and their variance. SIGPLANConf. on Programming Language Design and Implementation (Portland, Ore., July 21–23), pp. 298–312.

  • Shen, Z., Li, Z., and Yew, P.C. 1990. An empirical study of Fortran programs for parallel compilers.IEEE Trans. on Parallel and Distributed Systems, 1, 3 (July): 356–364.

    Google Scholar 

  • Vajapeyam, S., Sohi, G.S., and Hsu, W.C. 1991. An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks. InProc., Internat. Symp. on Comp. Architecture (Toronto, May 27–30), pp. 170–179.

  • Williams, E., and Bobrowicz, F. 1985. Speedup predications for large scientific parallel programs on CRAY X-MP-like architectures. InProc., Internat. Conf. on Parallel Processing (University Park, Penn., Aug 20–23), pp. 541–543.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elsesser, G.W., Ngo, V.N., Bhattacharya, S. et al. Processor preallocation and load balancing of DOALL loops. J Supercomput 8, 135–161 (1994). https://doi.org/10.1007/BF01204659

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01204659

Keywords

Navigation