Abstract
In loops, some arrays are referenced with induction variables. To parallelize such kind of loops, those induction variables should be substituted. Thus, those array references that were substituted are formulated as nonlinear expressions. The goal of data alignment is to intelligently map the computations and data onto a set of virtual processors which are organized as a Cartesian grid (or a template in HPF terms), and to provide data locality for parallelizing compilers so that data access communication costs can be minimized. Most data alignment methods are mainly devised to align the referenced arrays using linear subscripts or quadratic subscripts with n loop index variables, and the methods are well developed. Seldom work, however, is researched on the nonlinear expressions of index variables. This paper proposes a new communication-free data alignment technique to align the referenced arrays using exponential subscripts with n loop index variables or other complex nonlinear expressions. The experimental results using SPEC95FP Benchmarks point out that the techniques proposed in the paper can improve the execution time of the subroutines in these benchmarks.
Similar content being viewed by others
References
Alex A, Codina MJ, Alez GA, Kaeli D (2004) Removing communications in clustered micro-architectures through instruction replication. ACM Trans Archit Code Optim 1(2):127–151
Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, pp 46–60
Boudet V, Rastello F, Yves R (1998) Alignment and distribution is NOT (always) NP-hard. In: Proceeding of 1998 international conference on parallel and distributed systems, vol 5(9), 1998, pp 648–657
Chang W-L, Chu C-P, Wu J-H (2001) Communication-free alignment for array references with linear subscripts in three loop index variables or quadratic subscripts. J Supercomput 20(1):67–83
Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distrib Syst 15(1):28–39
Chu C-P, Chang W-L, Chen I, Chen P-S (1998) Communication-free alignment for array references with linear subscripts in two loop index variables or quadratic subscripts. In: Proceedings of the second IASTED international conference on parallel and distributed computing and networks (PDCN’98), Australia, 1998, pp 571–576
Dion M, Yves R (1996) Mapping Affine loop nests: new results. Parallel Comput 22(10):1373–1397
Edmonds J (1967) Systems of distinct representative and linear algebra. J Res Nat B Stand Sect B 71(4):241–245
Franke B, O’Boyle MFP (2005) A complete compiler approach to auto-parallelizing C programs for multi-DSP systems. IEEE Trans Parallel Distrib Syst 16(3):234–245
Feautrier P (1993) Toward automatic partitioning of arrays on distributed memory computers. In: ACM international conference on supercomputing, 1993, pp 175–184
Gschwind M, Hofstee HP, Flachs B, Hopkins M, Watanabe Y, Yamazaki T (2006) Synergistic processing in cell’s multicore architecture. IEEE Micro 26(2):10–24
Gebis J, Patterson D (2007) Embracing and extending 20th-century instruction set architectures. Computer 40(4):68–75
Guo M, Yamashita Y, Nakata I (1998) Efficient implementation of multi-dimensional array redistribution. IEICE Trans Inf Syst E81-D(11):1195–1204
Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(8):1325–1343
Guo M, Nakata I (2001) A framework for efficient array redistribution on distributed memory multicomputers. J Supercomput 20(3):243–265
Guo M (2003) Efficient loop partitioning for parallel codes of irregular scientific computations. IEICE Trans Inf Syst E86-D(9):1825–1834
Guo M (2003) Communication generation for irregular codes. J Supercomput 25(3):199–214
Hoeflinger J (1998) Interprocedural parallelization using memory classification analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev
Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for array redistribution. IEEE Trans Parallel Distrib Syst 11(12):1201–1216
Hsu C-H, Lan C-Y, Chen S-C (2006) Optimizing scheduling stability for runtime data alignment. In: EUC 2006 proceedings. Lecture notes in computer science, vol 4097. Springer, Berlin
Hwang G-H, Lee JK (1999) An expression-rewriting framework to generate communication sets for HPF programs with block-cyclic distribution. Parallel Comput 25:1105–1139
Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1998) A hyperplane based approach for optimizing spatial locality in loop nests. In: Proc 12th ACM int conf supercomputing, 1998, pp 69–76
Kandemir M, Ramanujam J, Choudhary A, Banerjee P (1998) A loop transformation algorithm based on explicit data layout representation for optimizing locality. In: Proc 11th international workshop, LCPC’98, Chapel Hill, NC, USA, 1998, pp 34–50
Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3–4):445–475
Lam AW, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: 13th ACM international conference on supercomputing, Rhodes, Greece, 1999, pp 228–237
Lee PZ (1997) Efficient algorithms for data distribution on distributed memory parallel computers. IEEE Trans Parallel Distrib Syst 8(8):825–839
Luenberger DG (1984) Linear and nonlinear programming. Addison-Wesley, Reading
Ozcan E, Onbasioglu E (2007) Memetic algorithms for parallel code optimization. Int J Parallel Program 35(1)
Paek Y (1997) Compiling for distributed memory multiprocessors based on access region analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev
Pan L, Xue J, Lai MK (2007) Toward automatic data distribution for migrating computations. In: The proceedings of 2007 international conference on parallel processing, September 2007
Petersen MP, Padua AD (1996) Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parallel Distrib Syst 7(11):1121–1132
Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distributed in distributed memory machines. IEEE Trans Parallel Distrib Syst 2(4):472–482
Reilly J (1995) SPEC95 products and benchmarks. SPEC Newsletter
Shikano H, Ito M (2008) Heterogeneous multi-core architecture that enables 54x AAC-LC stereo encoding. IEEE J Solid-State Circuits 43(4):902–910
Shih K-P, Sheu J-P, Huang C-H (2000) Statement-level communication-free partitioning techniques for parallelizing compilers. J Supercomput 15(3):243–269
Wu J-H, Chu C-P (2007) An exact data dependence testing method for quadratic expressions. Inf Sci 177(23)
Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading
Zhao Y, Kennedy K (2007) Dependence-based code generation for a CELL processor. In: Lecture notes in computer science, vol 4382. Springer, Berlin
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guo, M., Chang, WL., Jiang, B. et al. Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems. J Supercomput 60, 4–30 (2012). https://doi.org/10.1007/s11227-009-0280-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0280-y