A flexible processor mapping technique toward data localization for block-cyclic data redistribution

Huang, Jih-Woei; Chu, Chih-Ping

doi:10.1007/s11227-007-0166-9

A flexible processor mapping technique toward data localization for block-cyclic data redistribution

Published: 28 December 2007

Volume 45, pages 151–172, (2008)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jih-Woei Huang¹ &
Chih-Ping Chu¹

53 Accesses
3 Citations
Explore all metrics

Abstract

Array redistribution is usually needed for more efficiently executing a data-parallel program on distributed memory multicomputers. To minimize the redistribution data transfer cost, processor mapping techniques were proposed to reduce the amount of redistributed data elements. Theses techniques demand that the beginning data elements on a processor not be redistributed in the redistribution. On the other hand, for satisfying practical computation needs, a programmer may require other data elements to be un-redistributed (localized) in the redistribution. In this paper, we propose a flexible processor mapping technique for the Block-Cyclic redistribution to allow the programmer to localize the required data elements in the redistribution. We also present an efficient redistribution method for the redistribution employing our proposed technique. The data transfer cost reduction and system performance improvement for the redistributions with data localization are analyzed and presented in our experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mapping techniques in multicore processors: current and future trends

Article 05 February 2021

Efficient Graph Algorithms for Mapping Tasks to Processors

Single Producer – Multiple Consumers Ring Buffer Data Distribution System with Memory Management

References

Kalns ET, Ni LM (1994) DaReL: a portable data redistribution library for distributed-memory machines. In: Proceedings of scalable parallel libraries conference II, October 1994
Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, August 1994, pp 46–60
Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distribution in distributed memory machines. IEEE Trans Parallel Distributed Syst 2(4):472–482
Article Google Scholar
Dion M, Robert Y (1996) Mapping affine loop nests: new results. Parallel Comput 22(10):1373–1397
Article MATH MathSciNet Google Scholar
Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3-4):445–475
Article MATH MathSciNet Google Scholar
Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distributed Syst 15(1):28–39
Article Google Scholar
Hiranandani S, Kennedy K, Mellor-Crummey J, Sethi A (1994) Compilation techniques for block-cyclic distributions. In: ACM international conference on supercomputing, July 1994, pp 392–403
Chatterjee S, Gilbert JR, Long FJE, Schreiber R, Teng S-H (1995) Generating local address and communication sets for data parallel programs. J Parallel Distributed Comput 26:72–84
Article MATH Google Scholar
Gupta SKS, Kaushik SD, Huang C-H, Sadayappan P (1996) On compiling array expressions for efficient execution on distributed-memory machines. J Parallel Distributed Comput 32:155–172
Article Google Scholar
Satoh M, Negishi K, Kobayashi A (2006) Analysis of two-level data mapping in an HPF compiler for distributed-memory machines. 32(4):280–300
Lin C-Y, Chung Y-C (2007) Data distribution schemes of sparse arrays on distributed memory multicomputers. J Supercomput 41(1):63–87
Article MathSciNet Google Scholar
Park N, Prasanna VK, Raghavendra CS (1999) Efficient algorithms for block-cyclic array redistribution between processor sets. IEEE Trans Parallel Distributed Syst 10(12):1217–1240
Article Google Scholar
Hsu C-H, Chung Y-H (1998) Efficient methods for kr→r and r→kr array redistribution. J Supercomput 12(3):253–276
Article MATH Google Scholar
Ramaswamy S, Banerjee P (1995) Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In: Frontiers ’95: the fifth symposium on the frontiers of massively parallel computation, February 1995, pp 342–349
Ramaswamy S, Simons B, Banerjee P (1996) Optimization for efficient array redistribution on distributed memory multicomputers. J Parallel Distributed Comput 38:217–228
Article MATH Google Scholar
Thakur R, Choudhary A, Fox G (1994) Runtime array redistribution in HPF programs. In: Proceedings of scalable high performance computing conference, May 1994, pp 309–316
Thakur R, Choudhary A, Ramanujam J (1996) Efficient algorithm for array redistribution. IEEE Trans Parallel Distributed Syst 7(6):587–594
Article Google Scholar
Prylli L, Tourancheau B (1997) Fast runtime block cyclic data redistribution on multiprocessors. J Parallel Distributed Comput 45:63–72
Article MATH Google Scholar
Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for efficient array redistribution. IEEE Trans Parallel Distributed Syst 11(12):1201–1216
Article Google Scholar
Walker DW, Otto SW (1996) Redistribution of block-cyclic data distributions using MPI. Concurr Pract Experience 8(9):707–728
Article Google Scholar
Desprez F, Dongarra J, Randriamaro C, Robert Y (1998) Scheduling block-cyclic array redistribution. IEEE Trans Parallel Distributed Syst 9(2):192–205
Article Google Scholar
Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(10):1325–1343
Article MATH Google Scholar
Guo M, Pan Y (2005) Improving communication scheduling for array redistribution. J Parallel Distributed Comput 65:553–563
Article MATH Google Scholar
Huang J-W, Chu C-P (2006) An efficient communication scheduling method for the processor mapping technique applied data redistribution. J Supercomput 37(3):297–318
Article MathSciNet Google Scholar
Cohen J, Jeannot E, Padoy N, Wagner F (2006) Messages scheduling for parallel data redistribution between clusters. IEEE Trans Parallel Distributed Syst 17(10):1163–1175
Article Google Scholar
Hsu C-H, Chen S-C, Lan C-Y (2007) Scheduling contention-free irregular redistributions in parallelizing compilers. J Supercomput 40(3):229–247
Article Google Scholar
Wakatani A, Wolfe M (1995) Optimization of array redistribution for distributed memory multicomputers. Parallel Comput 21(9):1485–1490
Article MATH Google Scholar
Kaushik SD, Huang C-H, Johnson RW, Sadayappan P (1994) An approach to communication-efficient data redistribution. In: Proceedings of international conference on supercomputing, July 1994, pp 364–373
Kaushik SD, Huang C-H, Ramanujam J, Sadayappan P (1995) Multi-phase array redistribution: modeling and evaluation. In: Proceedings of international parallel processing symposium, April 1995, pp 441–445
Kalns ET, Ni LM (1995) Processor mapping techniques toward efficient data redistribution. IEEE Trans Parallel Distributed Syst 6(12):1234–1247
Article Google Scholar
Hsu C-H, Chung Y-C, Yang D-L, Dow C-R (2001) A generalized processor mapping technique for array redistribution. IEEE Trans Parallel and Distributed Syst 12(7):743–757
Article Google Scholar
The Message Passing Interface (MPI) standard, http//www-unix.mcs.anl.gov/mpi

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 701, Taiwan, ROC
Jih-Woei Huang & Chih-Ping Chu

Authors

Jih-Woei Huang
View author publications
Search author on:PubMed Google Scholar
Chih-Ping Chu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jih-Woei Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, JW., Chu, CP. A flexible processor mapping technique toward data localization for block-cyclic data redistribution. J Supercomput 45, 151–172 (2008). https://doi.org/10.1007/s11227-007-0166-9

Download citation

Received: 10 October 2006
Accepted: 10 December 2007
Published: 28 December 2007
Issue Date: August 2008
DOI: https://doi.org/10.1007/s11227-007-0166-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A flexible processor mapping technique toward data localization for block-cyclic data redistribution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Mapping techniques in multicore processors: current and future trends

Efficient Graph Algorithms for Mapping Tasks to Processors

Single Producer – Multiple Consumers Ring Buffer Data Distribution System with Memory Management

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now