Skip to main content

Reuse-driven tiling for data locality

  • Data Locality
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1366))

Abstract

This paper applies unimodular transformations and tiling to improve the data locality of a loop nest. Due to data dependences and reuse information, not all loops will and can be tiled. Therefore, the approach proposed in this paper attempts to capture as much data reuse in the cache as possible while tiling as few loops as possible. By using cones to represent the data dependences and vector spaces to represent the reuse information in the program, a reuse-driven approach is presented to improve the data locality of the program. In the special case of a singly fully permutable loop nest, the data locality problem is formulated as an optimisation problem and solved optimally. In the general case, an algorithm is presented that attempts to construct the tiled loop nest in such a way that as much reuse as possible is carried in the innermost tiled loops.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Andonov and S. Rajopadhye. Optimal tiling of two-dimensional uniform recurrences. Technical Report 97-01, LIMAV, Universitè de Valenciennes, Jan. 1997.

    Google Scholar 

  2. P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling. Integration, the VLSI Journal, 17:33–51, 1994.

    Google Scholar 

  3. S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Supercomputing '92, pages 114–124, Minneapolis, Minn., Nov. 1992.

    Google Scholar 

  4. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proc. of the SIGPLAN'95 Coif Program Language Design and Implementation, pages 279–289, Jun. 1995.

    Google Scholar 

  5. K. Cooper, K. Kennedy, and N. McIntosh. Cross-loop reuse analysis and its application to cache optimizations. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.

    Google Scholar 

  6. A. Darte and F Vivien. A comparison of nested loops parallelization algorithms. Technical Report 95-11, Ecole Normale Supèrieure de Lyon, May. 1995.

    Google Scholar 

  7. A. Darse and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, Ecole Normale Supèrieure de Lyon, Nov. 1996.

    Google Scholar 

  8. A. Darte and F Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In Proc. of the 1996 International Conference on Parallel Architectures and Compilation Techniques, pages 281–291, Boston, MA., 1996.

    Google Scholar 

  9. M. E. Dyer and L. G. Proll. An algorithm for determining all extreme points of a convex polytope. Mathematical Programming, 12:81–96, 1977.

    Google Scholar 

  10. K. Gallivan, W. Jalby, and D. Gannon.On the problem of optimizing data transfers for complex memory systems. In Supercomputing '88, pages 238–253. ACM Press, 1988.

    Google Scholar 

  11. G. R. Gao, V. Sarkar, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In Proc. of the 9th Workshop on Languages and Compilers for Parallel Computing, Aug. 1996.

    Google Scholar 

  12. F. Irigoin. Loop reordering with dependence direction vectors. Technical Report EMP-CAI-I A/184, Ecole Nationale Superieure des Mines de Paris, Nov. 1988.

    Google Scholar 

  13. F. Irigoin and R. Triolet. Supemode partitioning. In Proc. of the 15th Annual ACM Symposium on Principles of Programming Languages, pages 319–329, San Diego, Jan. 1988.

    Google Scholar 

  14. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proc. of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, Apr. 1991.

    Google Scholar 

  15. H. Le Verge. A note on chemikova's algorithm. Technical Report 635, IRISA (INRIA-Rennes), Feb. 1992.

    Google Scholar 

  16. H. Ohta, Y Saito, M. Kainaga, and H. Ono. Optimal tile size adjustment in compiling for general DOACROSS loop nests. In 1995 ACM International Conference on Supercomputing, pages 270–279. ACM Press, 1995.

    Google Scholar 

  17. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. J. of Parallel and Distributed Computing, 16(2):108–230, Oct. 1992.

    Google Scholar 

  18. R. Schreiber and J. J. Dongarra. Automatic blocking of nested loops. Technical Report 90.38, RIACS, May 1990.

    Google Scholar 

  19. A. Schrijver. Theory of Linear and Integer Programming. Series in Discrete Mathematics. John Wiley & Sons, 1986.

    Google Scholar 

  20. M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, Mar. 1992.

    Google Scholar 

  21. M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proc. of the ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation. ACM, Jun. 1991.

    Google Scholar 

  22. M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. on Parallel and Distributed Systems, 2(4):452–471, Oct. 1991.

    Google Scholar 

  23. M. J. Wolfe. More iteration space tiling. In Supercomputing '88, pages 655–664, Nov. 1989.

    Google Scholar 

  24. M. J. Wolfe. Optimizing Supercompilers for Supercomputers. Research Monographs in Parallel and Distributed Computing. MIT Press, 1989.

    Google Scholar 

  25. M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.

    Google Scholar 

  26. J. Xue. Automating non-unimodular loop transformations for massive parallelism. Parallel Computing, 20 (5):711–728, 1994.

    Google Scholar 

  27. J. Xue. On tiling as a loop transformation. In Proc. of the SPDP Workshop on Challenges in Compiling for Scalable Parallel Systems, New Orleans, 1996. IEEE Computer Society Press.

    Google Scholar 

  28. J. Xue. Communication-minimal tiling of uniform dependence loops. Journal of Parallel and Distributed Computing, 42:42–59, 1997.

    Google Scholar 

  29. Y. Q. Yang, C. Ancourt, and F. Irigoin.Minimal data dependence abstractions for loop transformations. In Proc. of the 7th Workshop on Languages and Compilers for Parallel Computing, Ithaca, Aug 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhiyuan Li Pen-Chung Yew Siddharta Chatterjee Chua-Huang Huang P. Sadayappan David Sehr

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xue, J., Huang, CH. (1998). Reuse-driven tiling for data locality. In: Li, Z., Yew, PC., Chatterjee, S., Huang, CH., Sadayappan, P., Sehr, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1997. Lecture Notes in Computer Science, vol 1366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032681

Download citation

  • DOI: https://doi.org/10.1007/BFb0032681

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64472-9

  • Online ISBN: 978-3-540-69788-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics