Abstract
Communication set generation significantly influences the performance of parallel programs. However, studies seldom give attention to the problem of communication set generation for irregular applications. In this paper, we propose communication optimization techniques for the situation of irregular array references in nested loops. In our methods, the local array distribution schemes are determined so that the total number of communication messages is minimized. Then, we explain how to support communication set generation at compile-time by introducing some symbolic analysis techniques. In our symbolic analysis, symbolic solutions of a set of symbolic expression are obtained by using certain restrictions. We introduce symbolic analysis algorithms to obtain the solutions in terms of a set of equalities and inequalities. Finally, experimental results on a parallel computer CM-5 are presented to validate our approach.
Similar content being viewed by others
References
M. Berry, D. Chen, P. Koss, D. Kuck, S. Lo, Y. Pang, R. Roloff, A. Sameh, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, K. Lue, S. Orzag, F. Seidl, O. Johnson, G. Swanson, R. Goodrum, and J. Martin. The PERFECT club benchmarks: effective performance evaluation of supercomputers. International Journal of Supercomputing Applications, 3(3):5–40, 1989.
W. Blume and R. Eigenmann. Nonlinear and symbolic data dependence testing. IEEE Transactions on Parallel and Distributed Systems, 9(12):1180–1194, Dec. 1998.
S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S. H. Teng. Generating local addresses and communication sets for data-parallel programs. Journal of Parallel and Distributed Computing, 26:72–84, 1995.
T. Fahringer. Efficient symbolic analysis for parallelizing compilers and performance estimators. The Journal of Supercomputing, 12:227–252, 1998.
T. Fahringer and B. Scholz. A unified symbolic evaluation framework for parallelizing compilers. IEEE Transactions on Parallel and Distributed Systems, 11(11):1105–1125, 2000.
J. Garcia, E. Ayguade, and J. Labarta. A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 12(4):416–431, 2001.
M. Guo, I. Nakata, and Y. Yamashita. Contention-free communication scheduling for array redistribution. Parallel Computing, 26(2000):1325–1343, 2000.
M. Guo, Y. Yamashita, and I. Nakata. Efficient implementation of multi-dimensional array redistribution. IEICE Transactions on Information and Systems, E81-D(11), Nov. 1998.
M. Guo, Z. Liu, C. Liu, and L. Li. Reducing communication cost for parallelizing irregular scientific codes. The 6th International Conference on Applied Parallel Computing, Finland, Lecture Notes in Computer Science 2367, pp. 203–216, June 2002.
M. Gupta, and P. Banerjee, PARADIGM: A compiler for automatic data distribution on multicomputers. Proc. 1993 ACM International Conference on Supercomputing, ACM, pp. 87–96, 1993.
S. K. S. Gupta, S. S. Kaushik, C. H. Huang, and P. Sadayappan. Compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32:155–172, 1996.
C. Koelbel, D. Loveman, R. Schreiber, G. Steele, and M. Zosel. The High Performance Fortran Handbook, The MIT Press, 1994.
K. Kennedy, N. Nedeljkovic, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proc. of ACM International Conf. on Supercomputing, pp. 180–184. Barcelona, Spain, July 1995.
U. Kremer, J. Mellor-Crummey, K. Kennedy, and A. Carle, Automatic data layout for distributememory machines in the D programming environment. In C. W. Kessler, ed., Automatic Parallelization-New Approaches to Code Generation, Data Distribution, and Performance Prediction, Vieweg Advanced Studies in Computer Science, Verlag Vieweg, Wiesbaden, Germany, pp. 136–147, 1993.
Scott R. Kohn and Scott B. Baden. A robust parallel programming model for dynamic non-uniform scientific computations. In Proc. SHPCC, 1994.
M. Haghighat and C. Polychronopoulos. Symbolic analysis: A basis for parallelization, optimization, and scheduling of programs. In Proc. Sixth Ann. Workshop Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
A. Lain, D. R. Chakrabarti, and P. Banerjee. Compiler and run-time support for exploiting regularity within irregular applications. IEEE Transactions on Parallel and Distributed Systems, 11(2), February, 2000.
A. W. Lim, and M. S. Lam, Maximizing parallelism and minimizing synchronization with affine transforms. Proc. 24th Annual ACM SIGPALN-SIGART Symposium on Principles of Programming Languages, Paris, France, January 1997.
Y. Paek, A. Navarro, E. Zapata, J. Hoeflinger, and D. Padua. An advanced compiler framework for non-cache-coherent multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 13(3):241–259, 2002.
R. Ponnusamy, J. Saltz, and A. Choudhary. Runtime-compilation techniques for data partitioning and communication schedule reuse. In Proc. SuperComputing'93, pp. 361–370, November 1993.
A. Thirumalai, J. Ramanujam, and A. Venkatachar. Communication generation for data-parallel programs using integer lattices. In P. Sadayappan et al., ed., Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, Springer-Verlag, 1996.
E. H.-Y. Tseng and J.-L. Gaudlot. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Transactions on Parallel and Distributed Systems, 10(2):136–146, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Guo, M., Pan, Y. & Liu, Z. Symbolic Communication Set Generation for Irregular Parallel Applications. The Journal of Supercomputing 25, 199–214 (2003). https://doi.org/10.1023/A:1024262610201
Issue Date:
DOI: https://doi.org/10.1023/A:1024262610201