Reuse-driven tiling for data locality

Xue, Jingling; Huang, Chua-Huang

doi:10.1007/BFb0032681

Jingling Xue¹ &
Chua-Huang Huang²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1366))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

97 Accesses
5 Citations

Abstract

This paper applies unimodular transformations and tiling to improve the data locality of a loop nest. Due to data dependences and reuse information, not all loops will and can be tiled. Therefore, the approach proposed in this paper attempts to capture as much data reuse in the cache as possible while tiling as few loops as possible. By using cones to represent the data dependences and vector spaces to represent the reuse information in the program, a reuse-driven approach is presented to improve the data locality of the program. In the special case of a singly fully permutable loop nest, the data locality problem is formulated as an optimisation problem and solved optimally. In the general case, an algorithm is presented that attempts to construct the tiled loop nest in such a way that as much reuse as possible is carried in the innermost tiled loops.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Andonov and S. Rajopadhye. Optimal tiling of two-dimensional uniform recurrences. Technical Report 97-01, LIMAV, Universitè de Valenciennes, Jan. 1997.
Google Scholar
P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling. Integration, the VLSI Journal, 17:33–51, 1994.
Google Scholar
S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercomputing '92, pages 114–124, Minneapolis, Minn., Nov. 1992.
Google Scholar
S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proc. of the SIGPLAN'95 Coif Program Language Design and Implementation, pages 279–289, Jun. 1995.
Google Scholar
K. Cooper, K. Kennedy, and N. McIntosh. Cross-loop reuse analysis and its application to cache optimizations. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.
Google Scholar
A. Darte and F Vivien. A comparison of nested loops parallelization algorithms. Technical Report 95-11, Ecole Normale Supèrieure de Lyon, May. 1995.
Google Scholar
A. Darse and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, Ecole Normale Supèrieure de Lyon, Nov. 1996.
Google Scholar
A. Darte and F Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In Proc. of the 1996 International Conference on Parallel Architectures and Compilation Techniques, pages 281–291, Boston, MA., 1996.
Google Scholar
M. E. Dyer and L. G. Proll. An algorithm for determining all extreme points of a convex polytope. Mathematical Programming, 12:81–96, 1977.
Google Scholar
K. Gallivan, W. Jalby, and D. Gannon.On the problem of optimizing data transfers for complex memory systems. In Supercomputing '88, pages 238–253. ACM Press, 1988.
Google Scholar
G. R. Gao, V. Sarkar, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.
Google Scholar
F. Irigoin. Loop reordering with dependence direction vectors. Technical Report EMP-CAI-I A/184, Ecole Nationale Superieure des Mines de Paris, Nov. 1988.
Google Scholar
F. Irigoin and R. Triolet. Supemode partitioning. In Proc. of the 15th Annual ACM Symposium on Principles of Programming Languages, pages 319–329, San Diego, Jan. 1988.
Google Scholar
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, Apr. 1991.
Google Scholar
H. Le Verge. A note on chemikova's algorithm. Technical Report 635, IRISA (INRIA-Rennes), Feb. 1992.
Google Scholar
H. Ohta, Y Saito, M. Kainaga, and H. Ono. Optimal tile size adjustment in compiling for general DOACROSS loop nests. In 1995 ACM International Conference on Supercomputing, pages 270–279. ACM Press, 1995.
Google Scholar
J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. J. of Parallel and Distributed Computing, 16(2):108–230, Oct. 1992.
Google Scholar
R. Schreiber and J. J. Dongarra. Automatic blocking of nested loops. Technical Report 90.38, RIACS, May 1990.
Google Scholar
A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. John Wiley & Sons, 1986.
Google Scholar
M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, Mar. 1992.
Google Scholar
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proc. of the ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation. ACM, Jun. 1991.
Google Scholar
M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.
Google Scholar
M. J. Wolfe. More iteration space tiling. In Supercomputing '88, pages 655–664, Nov. 1989.
Google Scholar
M. J. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.
Google Scholar
M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
Google Scholar
J. Xue. Automating non-unimodular loop transformations for massive parallelism. Parallel Computing, 20 (5):711–728, 1994.
Google Scholar
J. Xue. On tiling as a loop transformation. In Proc. of the SPDP Workshop on Challenges in Compiling for Scalable Parallel Systems, New Orleans, 1996. IEEE Computer Society Press.
Google Scholar
J. Xue. Communication-minimal tiling of uniform dependence loops. Journal of Parallel and Distributed Computing, 42:42–59, 1997.
Google Scholar
Y. Q. Yang, C. Ancourt, and F. Irigoin.Minimal data dependence abstractions for loop transformations. In Proc. of the 7th Workshop on Languages and Compilers for Parallel Computing, Ithaca, Aug 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical and Computer Sciences, University of New England, 2351, Armidale, NSW, Australia
Jingling Xue
Department of Computer and Information Sciences, The Ohio State University, 43210-1277, Ohio, USA
Chua-Huang Huang

Authors

Jingling Xue
View author publications
You can also search for this author in PubMed Google Scholar
Chua-Huang Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhiyuan Li Pen-Chung Yew Siddharta Chatterjee Chua-Huang Huang P. Sadayappan David Sehr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, J., Huang, CH. (1998). Reuse-driven tiling for data locality. In: Li, Z., Yew, PC., Chatterjee, S., Huang, CH., Sadayappan, P., Sehr, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1997. Lecture Notes in Computer Science, vol 1366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032681

Download citation

DOI: https://doi.org/10.1007/BFb0032681
Published: 09 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64472-9
Online ISBN: 978-3-540-69788-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics