Abstract
Array redistribution is usually needed for more efficiently executing a data-parallel program on distributed memory multicomputers. To minimize the redistribution data transfer cost, processor mapping techniques were proposed to reduce the amount of redistributed data elements. Theses techniques demand that the beginning data elements on a processor not be redistributed in the redistribution. On the other hand, for satisfying practical computation needs, a programmer may require other data elements to be un-redistributed (localized) in the redistribution. In this paper, we propose a flexible processor mapping technique for the Block-Cyclic redistribution to allow the programmer to localize the required data elements in the redistribution. We also present an efficient redistribution method for the redistribution employing our proposed technique. The data transfer cost reduction and system performance improvement for the redistributions with data localization are analyzed and presented in our experimental results.
Similar content being viewed by others
References
Kalns ET, Ni LM (1994) DaReL: a portable data redistribution library for distributed-memory machines. In: Proceedings of scalable parallel libraries conference II, October 1994
Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, August 1994, pp 46–60
Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distribution in distributed memory machines. IEEE Trans Parallel Distributed Syst 2(4):472–482
Dion M, Robert Y (1996) Mapping affine loop nests: new results. Parallel Comput 22(10):1373–1397
Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3-4):445–475
Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distributed Syst 15(1):28–39
Hiranandani S, Kennedy K, Mellor-Crummey J, Sethi A (1994) Compilation techniques for block-cyclic distributions. In: ACM international conference on supercomputing, July 1994, pp 392–403
Chatterjee S, Gilbert JR, Long FJE, Schreiber R, Teng S-H (1995) Generating local address and communication sets for data parallel programs. J Parallel Distributed Comput 26:72–84
Gupta SKS, Kaushik SD, Huang C-H, Sadayappan P (1996) On compiling array expressions for efficient execution on distributed-memory machines. J Parallel Distributed Comput 32:155–172
Satoh M, Negishi K, Kobayashi A (2006) Analysis of two-level data mapping in an HPF compiler for distributed-memory machines. 32(4):280–300
Lin C-Y, Chung Y-C (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87
Park N, Prasanna VK, Raghavendra CS (1999) Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Trans Parallel Distributed Syst 10(12):1217–1240
Hsu C-H, Chung Y-H (1998) Efficient methods for kr→r and r→kr array redistribution. J Supercomput 12(3):253–276
Ramaswamy S, Banerjee P (1995) Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In: Frontiers ’95: the fifth symposium on the frontiers of massively parallel computation, February 1995, pp 342–349
Ramaswamy S, Simons B, Banerjee P (1996) Optimization for efficient array redistribution on distributed memory multicomputers. J Parallel Distributed Comput 38:217–228
Thakur R, Choudhary A, Fox G (1994) Runtime array redistribution in HPF programs. In: Proceedings of scalable high performance computing conference, May 1994, pp 309–316
Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithm for array redistribution. IEEE Trans Parallel Distributed Syst 7(6):587–594
Prylli L, Tourancheau B (1997) Fast runtime block cyclic data redistribution on multiprocessors. J Parallel Distributed Comput 45:63–72
Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for efficient array redistribution. IEEE Trans Parallel Distributed Syst 11(12):1201–1216
Walker DW, Otto SW (1996) Redistribution of block-cyclic data distributions using MPI. Concurr Pract Experience 8(9):707–728
Desprez F, Dongarra J, Randriamaro C, Robert Y (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distributed Syst 9(2):192–205
Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(10):1325–1343
Guo M, Pan Y (2005) Improving communication scheduling for array redistribution. J Parallel Distributed Comput 65:553–563
Huang J-W, Chu C-P (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37(3):297–318
Cohen J, Jeannot E, Padoy N, Wagner F (2006) Messages scheduling for parallel data redistribution between clusters. IEEE Trans Parallel Distributed Syst 17(10):1163–1175
Hsu C-H, Chen S-C, Lan C-Y (2007) Scheduling contention-free irregular redistributions in parallelizing compilers. J Supercomput 40(3):229–247
Wakatani A, Wolfe M (1995) Optimization of array redistribution for distributed memory multicomputers. Parallel Comput 21(9):1485–1490
Kaushik SD, Huang C-H, Johnson RW, Sadayappan P (1994) An approach to communication-efficient data redistribution. In: Proceedings of international conference on supercomputing, July 1994, pp 364–373
Kaushik SD, Huang C-H, Ramanujam J, Sadayappan P (1995) Multi-phase array redistribution: modeling and evaluation. In: Proceedings of international parallel processing symposium, April 1995, pp 441–445
Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distributed Syst 6(12):1234–1247
Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel and Distributed Syst 12(7):743–757
The Message Passing Interface (MPI) standard, http//www-unix.mcs.anl.gov/mpi
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, JW., Chu, CP. A flexible processor mapping technique toward data localization for block-cyclic data redistribution. J Supercomput 45, 151–172 (2008). https://doi.org/10.1007/s11227-007-0166-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-007-0166-9