Abstract
In achieving high performance on modern architectures it is critical to make effective use of the memory hierarchy. There are compiler-directed locality enhancement techniques that allow the transformation of program to achieve a higher locality: loop transformations, which are constrained by data dependences and data layout transformations, which have a global impact on the program locality. Due to these drawbacks, there must be a unification of the two techniques to achieve the benefits of both. In this paper, a novel unification of these techniques is presented. Using a model based on parameterized polyhedra and introducing new concepts, we propose a data locality optimization algorithm.
In comparison with the other approaches, the technique proposed is capable of solving more conflicts and optimizing more references, a subtle way is proposed to optimize incompatible references to the same array, in the same loop, and also references in a cycle in the interference graph. Using parameterized cost functions, our technique estimates the importance of each sub-graph and optimizes data locality. Our experimental results show a significant improvement over the prior approaches.
Similar content being viewed by others
References
Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1999) A matrix-based approach to global locality optimization. J Parallel Distrib Comput 58(2):190–235
Loechner V, Meister B, Clauss P (2002) Precise data locality optimization of nested loops. J Supercomput 21(1):37–76
Allen R, Kennedy K (2001) Optimizing compilers for modern architectures. Kaufmann, San Mateo
Bastoul C, Feautrier P (2003) Improving data locality by chunking. In: CC’12 int conf on compiler construction. Lecture notes in computer science, vol 2622, pp 320–335
Bik A, Knijnenburg P, Wijshoff H (1994) Reshaping access patterns for generating sparse codes. In: Proc of the 7th int workshop on languages and compilers for parallel computing, NY, USA, pp 406–422
Coleman S, McKinley K (1995) Tile size selection using cache organization and data layout. In: Proc of the ACM SIGPLAN conf on programming language design and implementation (PLDI ’95), La Jolla, California, USA, pp 279–290
Li W (1993) Compiling for NUMA parallel machines. PhD thesis, Computer Science Department, Cornell University, NY
Manjikian N, Abdelrahman T (1995) Fusion of loops for parallelism and locality. In: Proc of the 24th int conf on parallel processing (ICPP’95), Oconomowoc, Wisconsin, vol II, pp 19–28
McKinley K, Carr S, Tseng C (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst, 18(4):424–453
Wolf M, Lam M (1991) A data locality optimizing algorithm. In: Proc of the SIGPLAN ’91 conf on programming language design and implementation, Toronto, Ontario, pp 30–44
Wolfe M (1989) More iteration space tiling. In: Proc of supercomputing ’89, Reno, Nevada, pp 655–664
Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1998) A hyperplane based approach for optimizing spatial locality in loop nests. In: Proc of the 1998 ACM int conf on supercomputing (ICS’98), Melbourne, Australia, pp 69–76
Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1999) A linear algebra framework for automatic determination of optimal data layouts. IEEE Trans Parallel Distrib Syst 10(2):115–135
O’Boyle M, Knijnenburg P (1999) Non-singular data transformations: definition, validity, and applications. Int J Parallel Program 27(1):131–159
Rivera G, Tseng C (1998) Data transformations for eliminating conflict misses. In: The ACM SIGPLAN conf on programming language design and implementation (PLDI’98), Montreal, Canada, pp 38–49
Kandemir M (2004) Improving whole-program locality using intra-procedural and inter-procedural transformations. J Parallel Distrib Comput 65(7):564–582
Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1998) Improving locality using loop and data transformations in an integrated framework. In: Proc of the 31st int symp on microarchitecture (MICRO-31), Dallas, Texas, pp 285–296
O’Boyle M, Knijnenburg P (2002) Integrating loop and data transformations for global optimization. J Parallel Distrib Comput 62(4):563–590
Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1999) A graph based framework to detect optimal memory layouts for improving data locality. In: Proc of the 1999 int parallel processing symp (IPPS’99), San Juan, Puerto Rico, pp 738–743
Blume W, Eigenmann R (1994) An overview of symbolic analysis techniques needed for the effective parallelization of the PERFECT benchmarks. In: Proc of the 1994 int conf on parallel processing (ICPP’94), North Carolina State University, vol II, pp 233–238
Loechner V, Meister B, Clauss P (2001) Data sequence locality: a generalization of temporal locality. In: Proc of the 7th int Euro-Par conf Manchester on parallel processing, UK, pp 262–272
Loechner V, Wilde D (1997) Parameterized polyhedra and their vertices. Int J Parallel Program, 25(6):525–549
Anderson J, Lam M (1993) Global optimizations for parallelism and locality on scalable parallel machines. In: Proc SIGPLAN conf on programming language design and implementation (PLDI’93), Albuquerque, New Mexico, pp 112–125
Wilde D (1993) A library for doing polyhedral operations. Technical report PI 785, IRISA, Rennes, France
Bastoul C, Cohen A, Girbal A, Sharma S, Temam O (2003) Putting polyhedral loop transformations to work. In: Proc of the workshop on languages and compilers for parallel computing (LCPC’03), Texas, USA. Lecture notes in computer science, vol 2558, pp 23–30
Feautrier P (1996) Automatic parallelization in the polytope model. In: The data parallel programming model. Lecture notes in computer science, vol 1132, pp 79–103
Ramanujam J (1992) A linear algebraic view of loop transformations and their interaction. In: Sorensen D (ed) Proc of the 5th SIAM conf on parallel processing for scientific computing. SIAM, Philadelphia, pp 543–548
Schrijver A (1986) Theory of linear and integer programming. Wiley, New York
Ancourt C, Irigoin F (1991) Scanning polyhedra with DO loops. In: Proc of the 3rd ACM SIGPLAN symp principle and practice of parallel programming, Williamsburg, USA, pp 39–50
Bastoul C (2002) Generating loops for scanning polyhedra—CLooG user’s guide. Technical report 2002/23, PRiSM, Versailles University, France
Le Verge H, Van Dongen V, Wilde D (1994) Loop nest synthesis using the polyhedral library. Technical report 830, IRISA, Rennes, France
Rajopadhye S, Quilleré F, Wilde D (2000) Generation of efficient nested loops from polyhedra. Int J Parallel Program, 28(5):469–498
Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading
Sam SV (2009) A bijective proof for a theorem of Ehrhart. Am Math Mon 116(8):688–701
Clauss P (1996) Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. Research report ICPS 96-03, 10th ACM int conf on Supercomputing (ICS’96), Philadelphia, Pennsylvania, USA, pp 278–285
Ancourt C, Irigoin F, Yang Y (1995) Minimal data dependence abstractions for loop transformations. Int J Parallel Program, 23(4):359–388
Kandemir M (2004) Impact of data transformations on memory bank locality. In: Proc of the design, automation and test in Europe conf and exhibition (DATE’ 04), Paris, France, vol 1, pp 10506–10511
Gilbert W (1993) Bricklaying and the Hermite normal form. Am Math Mon 100(3):242–245
Zhan X (2006) Completion of a partial integral matrix to a unimodular matrix. Linear Algebra Appl 414(1):373–377
Mohar B, Poljak S (1993) Eigenvalues in combinatorial optimization: combinatorial and graph-theoretical problems in linear algebra. IMA Vol Math Appl 50:107–151
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Motallebi, H., Parsa, S. Data locality optimization of interference graphs based on polyhedral computations. J Supercomput 61, 935–965 (2012). https://doi.org/10.1007/s11227-011-0660-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0660-y