Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

Guo, Minyi; Chang, Weng-Long; Jiang, Bo; Huang, Shu-Chien; Tsai, Sien-Tang; Ho, Michael (Shan-Hui)

doi:10.1007/s11227-009-0280-y

Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

Published: 25 March 2009

Volume 60, pages 4–30, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Minyi Guo^1,3,
Weng-Long Chang²,
Bo Jiang¹,
Shu-Chien Huang⁴,
Sien-Tang Tsai⁵ &
…
Michael (Shan-Hui) Ho⁶

85 Accesses
2 Citations
Explore all metrics

Abstract

In loops, some arrays are referenced with induction variables. To parallelize such kind of loops, those induction variables should be substituted. Thus, those array references that were substituted are formulated as nonlinear expressions. The goal of data alignment is to intelligently map the computations and data onto a set of virtual processors which are organized as a Cartesian grid (or a template in HPF terms), and to provide data locality for parallelizing compilers so that data access communication costs can be minimized. Most data alignment methods are mainly devised to align the referenced arrays using linear subscripts or quadratic subscripts with n loop index variables, and the methods are well developed. Seldom work, however, is researched on the nonlinear expressions of index variables. This paper proposes a new communication-free data alignment technique to align the referenced arrays using exponential subscripts with n loop index variables or other complex nonlinear expressions. The experimental results using SPEC95FP Benchmarks point out that the techniques proposed in the paper can improve the execution time of the subroutines in these benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Parray: A Unifying Array Representation for Heterogeneous Parallelism

Locality-Based Optimizations in the Chapel Compiler

References

Alex A, Codina MJ, Alez GA, Kaeli D (2004) Removing communications in clustered micro-architectures through instruction replication. ACM Trans Archit Code Optim 1(2):127–151
Article Google Scholar
Bau D, Kodukula I, Kotlyar V, Pingali K, Stodghill P (1994) Solving alignment using elementary linear algebra. In: Conference record of the 7th workshop on languages and compilers for parallel computing, pp 46–60
Boudet V, Rastello F, Yves R (1998) Alignment and distribution is NOT (always) NP-hard. In: Proceeding of 1998 international conference on parallel and distributed systems, vol 5(9), 1998, pp 648–657
Chang W-L, Chu C-P, Wu J-H (2001) Communication-free alignment for array references with linear subscripts in three loop index variables or quadratic subscripts. J Supercomput 20(1):67–83
Article MATH Google Scholar
Chang W-L, Huang J-W, Chu C-P (2004) Using elementary linear algebra to solve data alignment for arrays with linear or quadratic references. IEEE Trans Parallel Distrib Syst 15(1):28–39
Article Google Scholar
Chu C-P, Chang W-L, Chen I, Chen P-S (1998) Communication-free alignment for array references with linear subscripts in two loop index variables or quadratic subscripts. In: Proceedings of the second IASTED international conference on parallel and distributed computing and networks (PDCN’98), Australia, 1998, pp 571–576
Dion M, Yves R (1996) Mapping Affine loop nests: new results. Parallel Comput 22(10):1373–1397
Article MathSciNet MATH Google Scholar
Edmonds J (1967) Systems of distinct representative and linear algebra. J Res Nat B Stand Sect B 71(4):241–245
MathSciNet MATH Google Scholar
Franke B, O’Boyle MFP (2005) A complete compiler approach to auto-parallelizing C programs for multi-DSP systems. IEEE Trans Parallel Distrib Syst 16(3):234–245
Article Google Scholar
Feautrier P (1993) Toward automatic partitioning of arrays on distributed memory computers. In: ACM international conference on supercomputing, 1993, pp 175–184
Gschwind M, Hofstee HP, Flachs B, Hopkins M, Watanabe Y, Yamazaki T (2006) Synergistic processing in cell’s multicore architecture. IEEE Micro 26(2):10–24
Article Google Scholar
Gebis J, Patterson D (2007) Embracing and extending 20th-century instruction set architectures. Computer 40(4):68–75
Article Google Scholar
Guo M, Yamashita Y, Nakata I (1998) Efficient implementation of multi-dimensional array redistribution. IEICE Trans Inf Syst E81-D(11):1195–1204
Google Scholar
Guo M, Nakata I, Yamashita Y (2000) Contention-free communication scheduling for array redistribution. Parallel Comput 26(8):1325–1343
Article MATH Google Scholar
Guo M, Nakata I (2001) A framework for efficient array redistribution on distributed memory multicomputers. J Supercomput 20(3):243–265
Article MATH Google Scholar
Guo M (2003) Efficient loop partitioning for parallel codes of irregular scientific computations. IEICE Trans Inf Syst E86-D(9):1825–1834
Google Scholar
Guo M (2003) Communication generation for irregular codes. J Supercomput 25(3):199–214
Article MATH Google Scholar
Hoeflinger J (1998) Interprocedural parallelization using memory classification analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev
Hsu C-H, Bai S-W, Chung Y-C, Yang C-S (2000) A generalized basic-cycle calculation method for array redistribution. IEEE Trans Parallel Distrib Syst 11(12):1201–1216
Article Google Scholar
Hsu C-H, Lan C-Y, Chen S-C (2006) Optimizing scheduling stability for runtime data alignment. In: EUC 2006 proceedings. Lecture notes in computer science, vol 4097. Springer, Berlin
Google Scholar
Hwang G-H, Lee JK (1999) An expression-rewriting framework to generate communication sets for HPF programs with block-cyclic distribution. Parallel Comput 25:1105–1139
Article MATH Google Scholar
Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1998) A hyperplane based approach for optimizing spatial locality in loop nests. In: Proc 12th ACM int conf supercomputing, 1998, pp 69–76
Kandemir M, Ramanujam J, Choudhary A, Banerjee P (1998) A loop transformation algorithm based on explicit data layout representation for optimizing locality. In: Proc 11th international workshop, LCPC’98, Chapel Hill, NC, USA, 1998, pp 34–50
Lam AW, Lam MS (1998) Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Comput 24(3–4):445–475
Article MathSciNet MATH Google Scholar
Lam AW, Cheong GI, Lam MS (1999) An affine partitioning algorithm to maximize parallelism and minimize communication. In: 13th ACM international conference on supercomputing, Rhodes, Greece, 1999, pp 228–237
Lee PZ (1997) Efficient algorithms for data distribution on distributed memory parallel computers. IEEE Trans Parallel Distrib Syst 8(8):825–839
Article Google Scholar
Luenberger DG (1984) Linear and nonlinear programming. Addison-Wesley, Reading
MATH Google Scholar
Ozcan E, Onbasioglu E (2007) Memetic algorithms for parallel code optimization. Int J Parallel Program 35(1)
Paek Y (1997) Compiling for distributed memory multiprocessors based on access region analysis. PhD thesis, Univ of Illinois at Urbana-Champaign, Center for Supercomputing Res & Dev
Pan L, Xue J, Lai MK (2007) Toward automatic data distribution for migrating computations. In: The proceedings of 2007 international conference on parallel processing, September 2007
Petersen MP, Padua AD (1996) Static and dynamic evaluation of data dependence analysis techniques. IEEE Trans Parallel Distrib Syst 7(11):1121–1132
Article Google Scholar
Ramanujam J, Sadayappan P (1991) Compile-time techniques for data distributed in distributed memory machines. IEEE Trans Parallel Distrib Syst 2(4):472–482
Article Google Scholar
Reilly J (1995) SPEC95 products and benchmarks. SPEC Newsletter
Shikano H, Ito M (2008) Heterogeneous multi-core architecture that enables 54x AAC-LC stereo encoding. IEEE J Solid-State Circuits 43(4):902–910
Article Google Scholar
Shih K-P, Sheu J-P, Huang C-H (2000) Statement-level communication-free partitioning techniques for parallelizing compilers. J Supercomput 15(3):243–269
Article MATH Google Scholar
Wu J-H, Chu C-P (2007) An exact data dependence testing method for quadratic expressions. Inf Sci 177(23)
Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading
MATH Google Scholar
Zhao Y, Kennedy K (2007) Dependence-based code generation for a CELL processor. In: Lecture notes in computer science, vol 4382. Springer, Berlin
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, Dalian Maritime University, Dalian, Liaoning, 116026, China
Minyi Guo & Bo Jiang
Department of Computer Science and Information Engineering, National Kaohsiung University of Applied Sciences, 415 Chien Kung Road, Kaohsiung, 807, Taiwan, ROC
Weng-Long Chang
School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580, Japan
Minyi Guo
Department of Computer Science, National PingTung University of Education, PingTung, Taiwan, ROC
Shu-Chien Huang
Department of Information Management, Southern Taiwan University of Technology, Tainan County, 710, Taiwan, ROC
Sien-Tang Tsai
Department of Information Management, School of Information Technology, Ming Chuan University, 5, Teh-Ming Rd., Gwei-Shan, Taoyuan, 333, Taiwan, ROC
Michael (Shan-Hui) Ho

Authors

Minyi Guo
View author publications
You can also search for this author in PubMed Google Scholar
Weng-Long Chang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Chien Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sien-Tang Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Michael (Shan-Hui) Ho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minyi Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, M., Chang, WL., Jiang, B. et al. Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems. J Supercomput 60, 4–30 (2012). https://doi.org/10.1007/s11227-009-0280-y

Download citation

Received: 06 August 2008
Accepted: 23 February 2009
Published: 25 March 2009
Issue Date: April 2012
DOI: https://doi.org/10.1007/s11227-009-0280-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

Abstract

Access this article

Similar content being viewed by others

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Parray: A Unifying Array Representation for Heterogeneous Parallelism

Locality-Based Optimizations in the Chapel Compiler

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Communication-free data alignment for arrays with exponential references in parallelizing compilers for scalable parallel systems

Abstract

Access this article

Similar content being viewed by others

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Parray: A Unifying Array Representation for Heterogeneous Parallelism

Locality-Based Optimizations in the Chapel Compiler

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation