Skip to main content
Log in

A flexible processor mapping technique toward data localization for block-cyclic data redistribution

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Array redistribution is usually needed for more efficiently executing a data-parallel program on distributed memory multicomputers. To minimize the redistribution data transfer cost, processor mapping techniques were proposed to reduce the amount of redistributed data elements. Theses techniques demand that the beginning data elements on a processor not be redistributed in the redistribution. On the other hand, for satisfying practical computation needs, a programmer may require other data elements to be un-redistributed (localized) in the redistribution. In this paper, we propose a flexible processor mapping technique for the Block-Cyclic redistribution to allow the programmer to localize the required data elements in the redistribution. We also present an efficient redistribution method for the redistribution employing our proposed technique. The data transfer cost reduction and system performance improvement for the redistributions with data localization are analyzed and presented in our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kalns ET, Ni LM (1994) DaReL: a portable data redistribution library for distributed-memory machines. In: Proceedings of scalable parallel libraries conference II, October 1994

  2. Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, August 1994, pp 46–60

  3. Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distribution in distributed memory machines. IEEE Trans Parallel Distributed Syst 2(4):472–482

    Article  Google Scholar 

  4. Dion M, Robert Y (1996) Mapping affine loop nests: new results. Parallel Comput 22(10):1373–1397

    Article  MATH  MathSciNet  Google Scholar 

  5. Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3-4):445–475

    Article  MATH  MathSciNet  Google Scholar 

  6. Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distributed Syst 15(1):28–39

    Article  Google Scholar 

  7. Hiranandani S, Kennedy K, Mellor-Crummey J, Sethi A (1994) Compilation techniques for block-cyclic distributions. In: ACM international conference on supercomputing, July 1994, pp 392–403

  8. Chatterjee S, Gilbert JR, Long FJE, Schreiber R, Teng S-H (1995) Generating local address and communication sets for data parallel programs. J Parallel Distributed Comput 26:72–84

    Article  MATH  Google Scholar 

  9. Gupta SKS, Kaushik SD, Huang C-H, Sadayappan P (1996) On compiling array expressions for efficient execution on distributed-memory machines. J Parallel Distributed Comput 32:155–172

    Article  Google Scholar 

  10. Satoh M, Negishi K, Kobayashi A (2006) Analysis of two-level data mapping in an HPF compiler for distributed-memory machines. 32(4):280–300

  11. Lin C-Y, Chung Y-C (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87

    Article  MathSciNet  Google Scholar 

  12. Park N, Prasanna VK, Raghavendra CS (1999) Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Trans Parallel Distributed Syst 10(12):1217–1240

    Article  Google Scholar 

  13. Hsu C-H, Chung Y-H (1998) Efficient methods for krr and rkr array redistribution. J Supercomput 12(3):253–276

    Article  MATH  Google Scholar 

  14. Ramaswamy S, Banerjee P (1995) Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In: Frontiers ’95: the fifth symposium on the frontiers of massively parallel computation, February 1995, pp 342–349

  15. Ramaswamy S, Simons B, Banerjee P (1996) Optimization for efficient array redistribution on distributed memory multicomputers. J Parallel Distributed Comput 38:217–228

    Article  MATH  Google Scholar 

  16. Thakur R, Choudhary A, Fox G (1994) Runtime array redistribution in HPF programs. In: Proceedings of scalable high performance computing conference, May 1994, pp 309–316

  17. Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithm for array redistribution. IEEE Trans Parallel Distributed Syst 7(6):587–594

    Article  Google Scholar 

  18. Prylli L, Tourancheau B (1997) Fast runtime block cyclic data redistribution on multiprocessors. J Parallel Distributed Comput 45:63–72

    Article  MATH  Google Scholar 

  19. Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for efficient array redistribution. IEEE Trans Parallel Distributed Syst 11(12):1201–1216

    Article  Google Scholar 

  20. Walker DW, Otto SW (1996) Redistribution of block-cyclic data distributions using MPI. Concurr Pract Experience 8(9):707–728

    Article  Google Scholar 

  21. Desprez F, Dongarra J, Randriamaro C, Robert Y (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distributed Syst 9(2):192–205

    Article  Google Scholar 

  22. Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(10):1325–1343

    Article  MATH  Google Scholar 

  23. Guo M, Pan Y (2005) Improving communication scheduling for array redistribution. J Parallel Distributed Comput 65:553–563

    Article  MATH  Google Scholar 

  24. Huang J-W, Chu C-P (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37(3):297–318

    Article  MathSciNet  Google Scholar 

  25. Cohen J, Jeannot E, Padoy N, Wagner F (2006) Messages scheduling for parallel data redistribution between clusters. IEEE Trans Parallel Distributed Syst 17(10):1163–1175

    Article  Google Scholar 

  26. Hsu C-H, Chen S-C, Lan C-Y (2007) Scheduling contention-free irregular redistributions in parallelizing compilers. J Supercomput 40(3):229–247

    Article  Google Scholar 

  27. Wakatani A, Wolfe M (1995) Optimization of array redistribution for distributed memory multicomputers. Parallel Comput 21(9):1485–1490

    Article  MATH  Google Scholar 

  28. Kaushik SD, Huang C-H, Johnson RW, Sadayappan P (1994) An approach to communication-efficient data redistribution. In: Proceedings of international conference on supercomputing, July 1994, pp 364–373

  29. Kaushik SD, Huang C-H, Ramanujam J, Sadayappan P (1995) Multi-phase array redistribution: modeling and evaluation. In: Proceedings of international parallel processing symposium, April 1995, pp 441–445

  30. Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distributed Syst 6(12):1234–1247

    Article  Google Scholar 

  31. Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel and Distributed Syst 12(7):743–757

    Article  Google Scholar 

  32. The Message Passing Interface (MPI) standard, http//www-unix.mcs.anl.gov/mpi

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jih-Woei Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, JW., Chu, CP. A flexible processor mapping technique toward data localization for block-cyclic data redistribution. J Supercomput 45, 151–172 (2008). https://doi.org/10.1007/s11227-007-0166-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-007-0166-9

Keywords

Navigation