Abstract
Array redistribution is required often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution; otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. In this paper, by using a notation for the local data description called an LDD, we propose a framework to optimize the array redistribution algorithm both in index computation and inter-processor communication. That is, our work makes an effort to optimize not only the computation cost but also communication cost for array redistribution algorithms. We present an efficient index computation method and generate a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. Some experiments show the efficiency and flexibility of our techniques.
Similar content being viewed by others
References
R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the 1994 International Conference on Parallel Archs. and Compilation Techniques, Montreal, Canada, Aug. 1994.
Y. Chung, C. Hsu, and S. Bai. A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(4):359-377, 1988.
K. Nakazawa, H. Nakamura, T. Boku, I. Nakata, and Y. Yamashita. CP-PACS: a massively parallel processor at the University of Tsukuba. Parallel Computing, 25(13–14):1635-1661, 1999.
F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, and Y. Robert. Scheduling block-cyclic array redistribution. IEEE Transactions on Parallel and Distributed Systems,9(2):192-205, 1998.
M. Guo, Y. Yamashita, and I. Nakata. Efficient implementation of multi-dimensional array redistribution. IEICE Transactions on Information andSystems, E81-D(11):1195-1204, 1998.
M. Guo, Y. Yamashita, and I. Nakata. Improving performance of multi-dimensional array redistribution on distributed memory machines. In Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments, Orlando, Fla. March 1998.
M. Guo. Efficient techniques for data distribution and redistribution in parallelizing compilers. Ph.D. Thesis, University of Tsukuba, Japan, July 1998.
HPF Forum. High Performance Fortran Language Speci.cation, version 2.0 ed. Rice University, Houston, Texas, 1996.
C. Hsu, S. Bai, Y. Chung, and C. Yang. A generalizedbasic-cycle calculation methodfor efficient array redistribution. IEEE Transactions on Parallel andDistributedSystems, 11(12):1201-1216, 2000.
S. D. Kaushik, C.-H. Huang, R. W. Johmson, and P. Sadayappan. An approach to communication efficient data redistribution. In Proceedings of the 8th ACM International Conference on Supercomputing, Manchester, U.K., July 1994.
S. D. Kaushik, C.-H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel andDistributedComputing, 38(2):237-247, 1996.
S. D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan. Multi-phase redistribution: a communication-efficient approach to array redistribution. Technical report, The Ohio State University, 1995.
E. T. Kalns and L. M. Ni. Processor mapping techniques toward efficient data redistribution. IEEE Transactions on Parallel andDistributedSystems, 6(12):1234-1247, 1995.
K. Kennedy, N. Nedeljkovic, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proceedings of the International Conference on Supercomputing, Barcelona, July 1995.
K. Kennedy and U. Kremer. Automatic data layout for high performance Fortran. In Proceedings of Supercomputing'95, San Diego, Calif., Dec. 1995.
U. Kremer. NP-completeness of dynamic remapping. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, Dec. 1993.
Y. W. Lim, P. B. Bhat, and V. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. IEEE Symposium on Parallel andDistributedProcessing, Oct. 1996.
Y. W. Lim, N. Park, and V. Prasanna. Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays. In Proceedings of the 26th International Conference on Parallel Processing, Bloomingdale, IL, Aug. 1997.
K. Nakazawa, H. Nakamura, and T. Boku. The architecture of massively parallel processor CP-PACS. Journal of Information Processing Society of Japan, 37(1):18-28, 1996(in Japanese).
D. J. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. In Proceedings of the 8th Workshop on Languages and Compilers for Parallel Computing, Aug. 1995.
D. J. Palermo, E. W. Hodges IV, and P. Banerjee. Dynamic data partitioning for distributed-memory multicomputers. Journal of Parallel andDistributedComputing, No. 38:158-175, 1996.
N. Park, V. K. Prasanna, and C. S. Raghavendra. Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Transactions on Parallel andDistributedSystems, 10(12):1217-1239, 1999.
S. Ramaswamy, B. Simons, and P. Banerjee. Optimizations for efficient array redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing, 38:217-228, 1996.
S. Ranka, J.-C., Wang, and G. Fox. Static and run-time algorithms for all-to-many personalized communication on permutation networks. IEEE Transactions on Parallel and Distributed Systems, 5(12):1266-1274, (1994).
S. Ranka, R. Shankar, and K. Alsabti. Many-to-many personalizedcommunication with bounded traffic. In Proceedings of Frontiers'95, 1995.
J. Stichnoth, D. O'Hallaron, and T. Gross. Generating communication for array statements: design, implementation, andevaluation, Journal of Parallel andDistributedComputing, pp. 150-159, 1994.
R. Thakur, A. Choudhary, and G. Fox. Runtime array redistribution in HPF programs. In Proceedings Scalable High Performance Computing Conference, May 1994, pp. 309-316.
R. Thakur, A. Choudhary, and J. Ramanujam. Efficient algorithms for array redistribution. IEEE Transactions on Parallel andDistributedSystems, 7(6):587-593, 1996.
E. H. Tseng and J. L. Gaudiot. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Transactions on Parallel and Distributed Systems, 10(2):136-146, 1999.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Guo, M., Nakata, I. A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers. The Journal of Supercomputing 20, 243–265 (2001). https://doi.org/10.1023/A:1011602732570
Issue Date:
DOI: https://doi.org/10.1023/A:1011602732570