Skip to main content

Quantifying the multi-level nature of tiling interactions

  • Data Locality
  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1366))

Abstract

Optimizations, including tiling, often target a single level of memory or parallelism, such as cache. These optimizations usually operate on a level-by-level basis, guided by a cost function parameterized by features of that single level. The benefit of optimizations guided by these one-level cost functions decreases as architectures tend towards a hierarchy of memory and of parallelism. We have identified three common architectural scenarios where a single tiling choice could be improved by using information from multiple levels in concert. For the first two scenarios, we derive multi-level cost functions which guide the optimal choice of tile size and shape, and quantify the improvement gained. We give both analysis and simulation results to support our points. For the third scenario, we summarize our findings.

This work supported in part by NSF CCR-9504150 and a UC MICRO grant in association with the Intel Corporation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Agarwal, D. Kranz, and V. Natarajan. Automatic partitioning of parallel loops and data arrays for distributed shared memory multiprocessors. In Int. Conf. on Parallel Computing, 1993.

    Google Scholar 

  2. C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In PPoPP, Apr. 1991.

    Google Scholar 

  3. U. Banerjee. Unimodular transformations of double loops. In LCPC, Aug. 1990.

    Google Scholar 

  4. S. Carr. Combining optimizations for cache and instruction-level parallelism. In PACT, 1996.

    Google Scholar 

  5. S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. J. of Supercomputing, Nov. 1992.

    Google Scholar 

  6. S. Carr and K. Kennedy. Improving the ratio of memory operations to floatingpoint operations in loops. TOPLAS, 16(6), Nov. 1994.

    Google Scholar 

  7. S. Carr, K. S. McKinley, and C. Tseng. Compiler optimizations for improving data locality. In ASPLOS, Oct. 1994.

    Google Scholar 

  8. L. Carter, J. Ferrante, and S. F. Hummel. Efficient parallelism via hierarchical tiling. In Parallel Processing for Scientific Computing, Feb. 1995.

    Google Scholar 

  9. L. Carter, J. Ferrante, and S. F. Hummel. Hierarchical tiling for improved superscalar perfomance. In IPPS, Apr. 1995.

    Google Scholar 

  10. L. Carter, J. Ferrante, S. F. Hummel, B. Alpern, and K. S. Gatlin. Hierarchical tiling: A methodology for high performance. Technical Report CS96-508, UCSD, Department of Computer Science and Engineering, Nov. 1996.

    Google Scholar 

  11. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In PLDI, June 1995.

    Google Scholar 

  12. P. Feautrier. Some efficient solutions to the affine scheduling problem, Part I, one-dimensional time. Int. J. of Parallel Programming, 21(5), Oct. 1992.

    Google Scholar 

  13. J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In LCPC, 1991.

    Google Scholar 

  14. D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. J. of Parallel and Distributed Computing, 5(5), Oct. 1988.

    Google Scholar 

  15. D. Gannon and K. Wang. Applying AI Techniques to Program Optimization for Parallel Computers, chapter 12. McGraw Hill Co., 1989.

    Google Scholar 

  16. K. Högstedt, L. Carter, and J. Ferrante. Calculating the idle time of a tiling. In POPL, 1997.

    Google Scholar 

  17. F. Irigoin and R. Violet. Supernode partitioning. In POPL, Jan. 1988.

    Google Scholar 

  18. W. Kelly and W. Pugh. A unifying framework for iteration reordering transformations. In Int. Conf. on Alg. and Arch. for Parallel Processing, Apr. 1995.

    Google Scholar 

  19. K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Int. Conf. on Supercomputing, July 1992.

    Google Scholar 

  20. K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In LCPC, 1993.

    Google Scholar 

  21. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In ASPLOS, Apr. 1991.

    Google Scholar 

  22. D. Lavery and W. Hwu. Unrolling-based optimizations for modulo scheduling. In MICRO-28, Dec. 1995.

    Google Scholar 

  23. D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184–1201, Dec. 1986.

    Google Scholar 

  24. J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for nonshared memory machines. In Supercomputing, Nov. 1991.

    Google Scholar 

  25. V. Sarkar, G. R. Gao, and S. Han. Locality analysis for distributed shared-memory multiprocessors. In LCPC, 1996.

    Google Scholar 

  26. V. Sarkar and R. Thekkath. A general framework for iteration-reordering loop transformations (Technical Summary). In PLDI, 1992.

    Google Scholar 

  27. M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In PLDI, 1991.

    Google Scholar 

  28. M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Trans. on Parallel and Distributed Systems, 2(4), 1991.

    Google Scholar 

  29. M. E. Wolf, D. Maydan, and D. Chen. Combining loop transformations considering caches and scheduling. In MICRO-29, Dec. 1996.

    Google Scholar 

  30. M. J. Wolfe. Iteration space tiling for memory hierarchies. In Parallel Processing for Scientific Computing, 1987.

    Google Scholar 

  31. M. J. Wolfe. More iteration space tiling. In Supercomputing, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhiyuan Li Pen-Chung Yew Siddharta Chatterjee Chua-Huang Huang P. Sadayappan David Sehr

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mitchell, N., Carter, L., Ferrante, J., Högstedt, K. (1998). Quantifying the multi-level nature of tiling interactions. In: Li, Z., Yew, PC., Chatterjee, S., Huang, CH., Sadayappan, P., Sehr, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1997. Lecture Notes in Computer Science, vol 1366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0032680

Download citation

  • DOI: https://doi.org/10.1007/BFb0032680

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64472-9

  • Online ISBN: 978-3-540-69788-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics