Skip to main content
Log in

Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In loops, some arrays are referenced with induction variables. To parallelize such kind of loops, those induction variables should be substituted. Thus, those array references that were substituted are formulated as nonlinear expressions. The goal of data alignment is to intelligently map the computations and data onto a set of virtual processors which are organized as a Cartesian grid (or a template in HPF terms), and to provide data locality for parallelizing compilers so that data access communication costs can be minimized. Most data alignment methods are mainly devised to align the referenced arrays using linear subscripts or quadratic subscripts with n loop index variables, and the methods are well developed. Seldom work, however, is researched on the nonlinear expressions of index variables. This paper proposes a new communication-free data alignment technique to align the referenced arrays using exponential subscripts with n loop index variables or other complex nonlinear expressions. The experimental results using SPEC95FP Benchmarks point out that the techniques proposed in the paper can improve the execution time of the subroutines in these benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alex A, Codina MJ, Alez GA, Kaeli D (2004) Removing communications in clustered micro-architectures through instruction replication. ACM Trans Archit Code Optim 1(2):127–151

    Article  Google Scholar 

  2. Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, pp 46–60

  3. Boudet V, Rastello F, Yves R (1998) Alignment and distribution is NOT (always) NP-hard. In: Proceeding of 1998 international conference on parallel and distributed systems, vol 5(9), 1998, pp 648–657

  4. Chang W-L, Chu C-P, Wu J-H (2001) Communication-free alignment for array references with linear subscripts in three loop index variables or quadratic subscripts. J Supercomput 20(1):67–83

    Article  MATH  Google Scholar 

  5. Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distrib Syst 15(1):28–39

    Article  Google Scholar 

  6. Chu C-P, Chang W-L, Chen I, Chen P-S (1998) Communication-free alignment for array references with linear subscripts in two loop index variables or quadratic subscripts. In: Proceedings of the second IASTED international conference on parallel and distributed computing and networks (PDCN’98), Australia, 1998, pp 571–576

  7. Dion M, Yves R (1996) Mapping Affine loop nests: new results. Parallel Comput 22(10):1373–1397

    Article  MathSciNet  MATH  Google Scholar 

  8. Edmonds J (1967) Systems of distinct representative and linear algebra. J Res Nat B Stand Sect B 71(4):241–245

    MathSciNet  MATH  Google Scholar 

  9. Franke B, O’Boyle MFP (2005) A complete compiler approach to auto-parallelizing C programs for multi-DSP systems. IEEE Trans Parallel Distrib Syst 16(3):234–245

    Article  Google Scholar 

  10. Feautrier P (1993) Toward automatic partitioning of arrays on distributed memory computers. In: ACM international conference on supercomputing, 1993, pp 175–184

  11. Gschwind M, Hofstee HP, Flachs B, Hopkins M, Watanabe Y, Yamazaki T (2006) Synergistic processing in cell’s multicore architecture. IEEE Micro 26(2):10–24

    Article  Google Scholar 

  12. Gebis J, Patterson D (2007) Embracing and extending 20th-century instruction set architectures. Computer 40(4):68–75

    Article  Google Scholar 

  13. Guo M, Yamashita Y, Nakata I (1998) Efficient implementation of multi-dimensional array redistribution. IEICE Trans Inf Syst E81-D(11):1195–1204

    Google Scholar 

  14. Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(8):1325–1343

    Article  MATH  Google Scholar 

  15. Guo M, Nakata I (2001) A framework for efficient array redistribution on distributed memory multicomputers. J Supercomput 20(3):243–265

    Article  MATH  Google Scholar 

  16. Guo M (2003) Efficient loop partitioning for parallel codes of irregular scientific computations. IEICE Trans Inf Syst E86-D(9):1825–1834

    Google Scholar 

  17. Guo M (2003) Communication generation for irregular codes. J Supercomput 25(3):199–214

    Article  MATH  Google Scholar 

  18. Hoeflinger J (1998) Interprocedural parallelization using memory classification analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev

  19. Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for array redistribution. IEEE Trans Parallel Distrib Syst 11(12):1201–1216

    Article  Google Scholar 

  20. Hsu C-H, Lan C-Y, Chen S-C (2006) Optimizing scheduling stability for runtime data alignment. In: EUC 2006 proceedings. Lecture notes in computer science, vol 4097. Springer, Berlin

    Google Scholar 

  21. Hwang G-H, Lee JK (1999) An expression-rewriting framework to generate communication sets for HPF programs with block-cyclic distribution. Parallel Comput 25:1105–1139

    Article  MATH  Google Scholar 

  22. Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1998) A hyperplane based approach for optimizing spatial locality in loop nests. In: Proc 12th ACM int conf supercomputing, 1998, pp 69–76

  23. Kandemir M, Ramanujam J, Choudhary A, Banerjee P (1998) A loop transformation algorithm based on explicit data layout representation for optimizing locality. In: Proc 11th international workshop, LCPC’98, Chapel Hill, NC, USA, 1998, pp 34–50

  24. Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3–4):445–475

    Article  MathSciNet  MATH  Google Scholar 

  25. Lam AW, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: 13th ACM international conference on supercomputing, Rhodes, Greece, 1999, pp 228–237

  26. Lee PZ (1997) Efficient algorithms for data distribution on distributed memory parallel computers. IEEE Trans Parallel Distrib Syst 8(8):825–839

    Article  Google Scholar 

  27. Luenberger DG (1984) Linear and nonlinear programming. Addison-Wesley, Reading

    MATH  Google Scholar 

  28. Ozcan E, Onbasioglu E (2007) Memetic algorithms for parallel code optimization. Int J Parallel Program 35(1)

  29. Paek Y (1997) Compiling for distributed memory multiprocessors based on access region analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev

  30. Pan L, Xue J, Lai MK (2007) Toward automatic data distribution for migrating computations. In: The proceedings of 2007 international conference on parallel processing, September 2007

  31. Petersen MP, Padua AD (1996) Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parallel Distrib Syst 7(11):1121–1132

    Article  Google Scholar 

  32. Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distributed in distributed memory machines. IEEE Trans Parallel Distrib Syst 2(4):472–482

    Article  Google Scholar 

  33. Reilly J (1995) SPEC95 products and benchmarks. SPEC Newsletter

  34. Shikano H, Ito M (2008) Heterogeneous multi-core architecture that enables 54x AAC-LC stereo encoding. IEEE J Solid-State Circuits 43(4):902–910

    Article  Google Scholar 

  35. Shih K-P, Sheu J-P, Huang C-H (2000) Statement-level communication-free partitioning techniques for parallelizing compilers. J Supercomput 15(3):243–269

    Article  MATH  Google Scholar 

  36. Wu J-H, Chu C-P (2007) An exact data dependence testing method for quadratic expressions. Inf Sci 177(23)

  37. Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading

    MATH  Google Scholar 

  38. Zhao Y, Kennedy K (2007) Dependence-based code generation for a CELL processor. In: Lecture notes in computer science, vol 4382. Springer, Berlin

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minyi Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, M., Chang, WL., Jiang, B. et al. Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems. J Supercomput 60, 4–30 (2012). https://doi.org/10.1007/s11227-009-0280-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-009-0280-y

Keywords

Navigation