Abstract
The array redistribution problem occurs in many important applications in parallel computing. In this paper, we consider this problem in a torus network. Tori are preferred to other multidimensional networks (like hypercubes) due to their better scalability (IEE Trans. Parallel Distrib. Syst. 50(10), 1201–1218, [2001]). We present a message combining approach that splits any array redistribution problem in a series of broadcasts where all sources send messages of the same size, thus a balanced traffic load is achieved. Unlike existing array redistribution algorithms, the scheme introduced in this work eliminates the need for data reorganization in the memory of the source and target processors. Moreover, the processing of the scheduled broadcasts is pipelined, thus the total cost of redistribution is reduced.
Similar content being viewed by others
References
Yang Y, Wang J (2001) Pipelined all-to-all broadcast in all-port meshes and tori. IEEE Trans Parallel Distrib Syst 50(10):1201–1218
Kaushik SD, Huang CH, Johnson RW, Sadayappan P (1994) An approach to communication-efficient data redistribution. In: Proceedings of the 8th ACM international conference on supercomputing, July 1994, Manchester, England
Park N, Prassana VK, Raghavendra CS (1999) Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Trans Parallel Distrib Syst 10(12):1217–1240
Prylli L, Touranchean B (1997) Fast runtime block cyclic data redistribution on multiprocessors. Parallel Distrib Comput 45:63–72
Ramaswamy S, Benerjee P (1995) Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In: Proc fifth symp frontiers of massively parallel computation, Feb 1995, pp 342–349
Wang L, Stichnoth JM, Chatterjee S (1996) Runtime performance of parallel array assignment: an empirical study. In: Proc 1996 ACM/IEEE supercomputing conf. http://www.supercomp.org/sc96/proceedings
Sundar NS, Jayasimha DN, Panda DK, Sadayappan P (2001) Hybrid algorithms for complete exchange in 2D meshes. IEEE Trans Parallel Distrib Syst 12(12):1201–1218
Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distrib Syst 6(12):1234–1247
Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel Distrib Syst 12(7):743–757
Huang J-W, Chu C-P (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37:297–318
Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithms for array redistribution. IEEE Trans Parallel Distrib Syst 7(6):587–594
Walker DW, Otto SW (1996) Redistribution of block-cyclic data distributions using MPI. Concur Practice Exp 8(9):707–728
Lim YW, Bhat PB, Prasanna VK (1998) Efficient algorithms for block cyclic redistribution of arrays. Algorithmica 24:298–330
Desprez F, Dongarra J, Petitet A, Randriamaro C, Robert Y (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distrib Syst 9(2):192–205
Guo M, Nakata I (2001) A framework for efficient data redistribution on distributed memory multicomputers. J Supercomput 20:243–265
Tseng Y-C, Gupta SKS (1996) All-to-all personalized communication in a wormhole-routed torus. IEEE Trans Parallel Distrib Syst 7(5):498–505
Souravlas SI, Roumeliotis M (2004) A pipeline technique for dynamic data transfer on a multiprocessor grid. Int J Parallel Program 32(5):361–388
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Souravlas, S., Roumeliotis, M. A message passing strategy for array redistributions in a torus network. J Supercomput 46, 40–57 (2008). https://doi.org/10.1007/s11227-008-0185-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0185-1