Skip to main content
Log in

A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Array redistribution is required often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution; otherwise the performance of the programs will degrade considerably. The redistribution overheads consist of two parts: index computation and inter-processor communication. In this paper, by using a notation for the local data description called an LDD, we propose a framework to optimize the array redistribution algorithm both in index computation and inter-processor communication. That is, our work makes an effort to optimize not only the computation cost but also communication cost for array redistribution algorithms. We present an efficient index computation method and generate a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. Some experiments show the efficiency and flexibility of our techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proceedings of the 1994 International Conference on Parallel Archs. and Compilation Techniques, Montreal, Canada, Aug. 1994.

  2. Y. Chung, C. Hsu, and S. Bai. A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(4):359-377, 1988.

    Google Scholar 

  3. K. Nakazawa, H. Nakamura, T. Boku, I. Nakata, and Y. Yamashita. CP-PACS: a massively parallel processor at the University of Tsukuba. Parallel Computing, 25(13–14):1635-1661, 1999.

    Google Scholar 

  4. F. Desprez, J. Dongarra, A. Petitet, C. Randriamaro, and Y. Robert. Scheduling block-cyclic array redistribution. IEEE Transactions on Parallel and Distributed Systems,9(2):192-205, 1998.

    Google Scholar 

  5. M. Guo, Y. Yamashita, and I. Nakata. Efficient implementation of multi-dimensional array redistribution. IEICE Transactions on Information andSystems, E81-D(11):1195-1204, 1998.

    Google Scholar 

  6. M. Guo, Y. Yamashita, and I. Nakata. Improving performance of multi-dimensional array redistribution on distributed memory machines. In Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments, Orlando, Fla. March 1998.

  7. M. Guo. Efficient techniques for data distribution and redistribution in parallelizing compilers. Ph.D. Thesis, University of Tsukuba, Japan, July 1998.

  8. HPF Forum. High Performance Fortran Language Speci.cation, version 2.0 ed. Rice University, Houston, Texas, 1996.

    Google Scholar 

  9. C. Hsu, S. Bai, Y. Chung, and C. Yang. A generalizedbasic-cycle calculation methodfor efficient array redistribution. IEEE Transactions on Parallel andDistributedSystems, 11(12):1201-1216, 2000.

    Google Scholar 

  10. S. D. Kaushik, C.-H. Huang, R. W. Johmson, and P. Sadayappan. An approach to communication efficient data redistribution. In Proceedings of the 8th ACM International Conference on Supercomputing, Manchester, U.K., July 1994.

  11. S. D. Kaushik, C.-H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel andDistributedComputing, 38(2):237-247, 1996.

    Google Scholar 

  12. S. D. Kaushik, C.-H. Huang, J. Ramanujam, and P. Sadayappan. Multi-phase redistribution: a communication-efficient approach to array redistribution. Technical report, The Ohio State University, 1995.

  13. E. T. Kalns and L. M. Ni. Processor mapping techniques toward efficient data redistribution. IEEE Transactions on Parallel andDistributedSystems, 6(12):1234-1247, 1995.

    Google Scholar 

  14. K. Kennedy, N. Nedeljkovic, and A. Sethi. Efficient address generation for block-cyclic distributions. In Proceedings of the International Conference on Supercomputing, Barcelona, July 1995.

  15. K. Kennedy and U. Kremer. Automatic data layout for high performance Fortran. In Proceedings of Supercomputing'95, San Diego, Calif., Dec. 1995.

  16. U. Kremer. NP-completeness of dynamic remapping. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, Dec. 1993.

  17. Y. W. Lim, P. B. Bhat, and V. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. IEEE Symposium on Parallel andDistributedProcessing, Oct. 1996.

  18. Y. W. Lim, N. Park, and V. Prasanna. Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays. In Proceedings of the 26th International Conference on Parallel Processing, Bloomingdale, IL, Aug. 1997.

  19. K. Nakazawa, H. Nakamura, and T. Boku. The architecture of massively parallel processor CP-PACS. Journal of Information Processing Society of Japan, 37(1):18-28, 1996(in Japanese).

    Google Scholar 

  20. D. J. Palermo and P. Banerjee. Automatic selection of dynamic data partitioning schemes for distributed-memory multicomputers. In Proceedings of the 8th Workshop on Languages and Compilers for Parallel Computing, Aug. 1995.

  21. D. J. Palermo, E. W. Hodges IV, and P. Banerjee. Dynamic data partitioning for distributed-memory multicomputers. Journal of Parallel andDistributedComputing, No. 38:158-175, 1996.

  22. N. Park, V. K. Prasanna, and C. S. Raghavendra. Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Transactions on Parallel andDistributedSystems, 10(12):1217-1239, 1999.

    Google Scholar 

  23. S. Ramaswamy, B. Simons, and P. Banerjee. Optimizations for efficient array redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing, 38:217-228, 1996.

    Google Scholar 

  24. S. Ranka, J.-C., Wang, and G. Fox. Static and run-time algorithms for all-to-many personalized communication on permutation networks. IEEE Transactions on Parallel and Distributed Systems, 5(12):1266-1274, (1994).

    Google Scholar 

  25. S. Ranka, R. Shankar, and K. Alsabti. Many-to-many personalizedcommunication with bounded traffic. In Proceedings of Frontiers'95, 1995.

  26. J. Stichnoth, D. O'Hallaron, and T. Gross. Generating communication for array statements: design, implementation, andevaluation, Journal of Parallel andDistributedComputing, pp. 150-159, 1994.

  27. R. Thakur, A. Choudhary, and G. Fox. Runtime array redistribution in HPF programs. In Proceedings Scalable High Performance Computing Conference, May 1994, pp. 309-316.

  28. R. Thakur, A. Choudhary, and J. Ramanujam. Efficient algorithms for array redistribution. IEEE Transactions on Parallel andDistributedSystems, 7(6):587-593, 1996.

    Google Scholar 

  29. E. H. Tseng and J. L. Gaudiot. Communication generation for aligned and cyclic(k) distributions using integer lattice. IEEE Transactions on Parallel and Distributed Systems, 10(2):136-146, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, M., Nakata, I. A Framework for Efficient Data Redistribution on Distributed Memory Multicomputers. The Journal of Supercomputing 20, 243–265 (2001). https://doi.org/10.1023/A:1011602732570

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011602732570

Navigation