Skip to main content
Log in

Deriving Array Distributions by Optimization Techniques

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The paper presents a new method to derive data distributions for parallel computers with distributed memory organization by a mathematical optimization technique. Prerequisites for this approach are a parameterized data distribution and a rigorous performance prediction technique that allows us to derive runtime formulas containing the parameters of the data distribution. A mathematical optimization technique can then be used to determine the parameters in such a way that the total runtime is minimized, thus also minimizing the communication overhead and the load imbalance penalty. The method is demonstrated by using it to determine a data distribution for the LU decomposition of a matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. J. Anderson and M. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 112–125, 1993.

  2. P. Banerjee, J. Chandy, M. Gupta, E. Hodge, J. Holm, A. Lain, D. Palermo, S. Ramaswamy, and E. Su. The paradigm compiler for distributed-memory multicomputers. IEEE Computer, 28(10): 37–47, 1995.

    Google Scholar 

  3. D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation. Prentice Hall, 1989.

  4. R. Bixby, K. Kennedy, and U. Kremer. Automatic data layout using 0-1 integer programming. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques (PACT94), 1994.

  5. Parasoft, Co. EXPRESS User Manual. Parasoft Co., 1989.

  6. A. Dierstein, R. Hayer, and T. Rauber. The ADDAP system on the iPSC/860: Automatic data distribution and parallelization. Journal of Parallel and Distributed Computing, 32(1): 1–10, 1996.

    Google Scholar 

  7. R. Foschia, T. Rauber, and G. Rünger. Modeling the communication behavior of the Intel paragon. In Proc. 5th Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS'97), IEEE, pp. 117–124, 1997.

  8. G. Fox, R. Williams, and P. Messina. Parallel Computing Works! Morgan Kaufmann Publishers, 1994.

  9. J. Garcia, E. Ayguade, and J. Labarta, A novel approach towards automatic data distribution. In Proc. Supercomputing '95, 1995.

  10. M. Gupta. Automatic data partitioning on distributed memory multicomputers. Ph.D. thesis, University of Illinois, Urbana-Champaign, 1992.

    Google Scholar 

  11. Y. Hu, D. Emerson, and R. Blake. The communication performance of the Cray T3D and its effect on iterative solvers. Parallel Computing, 22: 829–844, 1996.

    Google Scholar 

  12. K. Hwang, Z. Xu, and M. Arakawa. Benchmark evaluation of the IBM SP2 for parallel signal processing. IEEE Transactions on Parallel and Distributed Systems, 7(5): 522–536, 1996.

    Google Scholar 

  13. K. Ikudome, G. Fox, A. Kolawa, and J. Flower. An automatic and symbolic parallelization system for distributed memory parallel computers. In Proc. 5th Distributed Memory Computing Conference, pp. 1105–1114, 1990.

  14. S. Johnsson. Performance modeling of distributed memory architecture. Journal of Parallel and Distributed Computing, 12: 300–312, 1991.

    Google Scholar 

  15. S. Johnsson and C. Ho. Optimum broadcasting and personalized communication in hypercubes. IEEE Transactions on Computers, 38(9): 1249–1268, 1989.

    Google Scholar 

  16. K. Knobe, J. Lukas, and G. Steele. Data optimizations: Allocation of arrays to reduce communication on SIMD machines. Journal of Parallel and Distributed Computing, 8: 102–118, 1990.

    Google Scholar 

  17. V. Kumar, A. Grama, A. Gupta, and G. Karypis. Introduction to Parallel Computing. Benjamin/Cummings, 1993.

  18. J. Li and M. Chen. Index domain alignment: Minimizing costs of cross-referencing between distributed arrays. In Third Symposium on the Frontiers of Massively Parallel Computation, pp. 424–433, 1990.

  19. M. Mace, Memory Storage Patterns in Parallel Processing. Kluwer Academic, 1987.

  20. D. Palermo, Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, 1996.

    Google Scholar 

  21. J. Ramanujan and P. Sadayappan. A methodology for parallelizing programs for multicomputers and complex memory multiprocessors. In Proc. Supercomputing 89, pp. 637–646, 1989.

  22. J. Ramanujan and P. Sadayappan. Compile-time techniques for data distribution in distributed memory machines. IEEE Transactions on Parallel and Distributed Systems, 2(4): 472–481, 1991.

    Google Scholar 

  23. T. Rauber and G. Rünger. Comparing task and data parallel execution schemes for the DIIRK method. In Proc. EuroPar'96, Springer LNCS 1124, pp. 52–61, 1996.

  24. T. Rauber and G. Rünger, Deriving structured parallel implementations for numerical methods. Microprocessing and Microprogramming, 41: 589–608, 1996.

    Google Scholar 

  25. T. Rauber and G. Rünger, Parallel iterated Runge-Kutta methods and applications. International Journal of Supercomputer Applications, 10(1): 62–90, 1996.

    Google Scholar 

  26. T. Rauber and G. Rünger. Load balancing schemes for extrapolation methods. Concurrency: Practice and Experience, 9(3): 181–202, 1997.

    Google Scholar 

  27. J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, 1990.

  28. E. van de Velde. Data redistribution and concurrency. Parallel Computing, 16: 125–138, 1990.

    Google Scholar 

  29. E. van de Velde. Concurrent Scientific Computing. Springer, 1994.

  30. S. Wholey. Automatic data mapping for distributed-memory parallel computers. In Ph.D. thesis, Carnegie Mellon University, Pittsburgh, 1991.

    Google Scholar 

  31. S. Wholey. Automatic data mapping for distributed-memory parallel computers. In Proc. Int. Conf. on Supercomputing, 1992.

  32. Z. Xu and K. Hwang. Early prediction of MPP performance: SP2, T3D and paragon experiences. Parallel Computing, 22: 917–942, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rauber, T., Rünger, G. Deriving Array Distributions by Optimization Techniques. The Journal of Supercomputing 15, 271–293 (2000). https://doi.org/10.1023/A:1008164427332

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008164427332

Navigation