Skip to main content
Log in

A Quantitative Analysis of Tile Size Selection Algorithms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Loop tiling is an effective optimizing transformation to boost the memory performance of a program, especially for dense matrix scientific computations. The magnitude and stability of the achieved performance improvements are heavily dependent on the appropriate selection of tile sizes. Many existing tile selection algorithms try to find tile sizes which eliminate self-interference cache conflict misses, maximize cache utilization, and minimize cross-interference cache conflict misses. These techniques depend heavily on the actual layout of the arrays in memory. Array padding, an effective data layout optimization technique, is therefore incorporated by many algorithms to help loop tiling stabilize its effectiveness by avoiding “pathological” array sizes.

In this paper, we examine several such combined algorithms in terms of cost-benefit trade-offs, and introduce a new algorithm. The preliminary experimental results show that more precise and costly tile selection and array padding algorithms may not be justified by the resulting performance improvements since such improvements may also be achieved by much simpler and therefore less expensive strategies. The key issues in finding a good tiling algorithm are (1) to identify critical performance factors and (2) to develop corresponding performance models that allow predictions at a sufficient level of accuracy. Following this insight, we have developed a new tiling algorithm that performs better than previous algorithms in terms of execution time and stability, and generates code with a performance comparable to the best measured algorithm. Experimental results on two standard benchmark kernels for matrix multiply and LU factorization show that the new algorithm is orders of magnitude faster than the best previous algorithm without sacrificing stability and execution speed of the generated code.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Bacon, J.-H. Chow, D.-C. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON'94—Integrated Solutions, pp. 270–282, October 1994.

  2. D. Bailey. Unfavorable strides in cache memory systems. Scientific Programming, 4(2):53–58, Summer 1995.

    Google Scholar 

  3. F. Bodin, W. Jalby, D. Windheiser, and C. Eisenbeis. A quantitative algorithm for data locality optimization. In Robert Giegerich and Susan Graham, ed., Code Generation: Concepts, Tools, Techniques, pp. 119–145. Springer Verlag, Berlin, 1992.

    Google Scholar 

  4. D. Callahan and A. Porterfield. Data cache performance of supercomputer applications. In Supercomputing' 90, pp. 564–572, November 1990.

  5. S. Carr and Y. Guan. Unroll-and-jam using uniformly generated sets. In Proceedings of the 30th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 349–357, December 1997.

  6. J. Chame and S. Moon. A tile selection algorithm for data locality and cache interference. In 1999 ACM International Conference on Supercomputing, June 1999.

  7. P. Clauss. Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs. In 1996 ACM International Conference on Supercomputing, ACM, May 1996.

  8. S. Coleman and K. McKinley. Tile size selection using cache organization and data layout. In ACM SIGPLAN' 95 Conference on Programming Language Design and Implementation, pp. 279–290, La Jolla, California, June 18–21, 1995.

  9. K. Esseghir. Improving data locality for caches. Master's thesis, Department of Computer Science, Rice University, September 1993.

  10. J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In 1991 Workshop on Languages and Compilers for Parallel Computing, pp. 328–343, 1991.

  11. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In 1997 ACM International Conference on Supercomputing, pp. 317–324, ACM Press, New York, July 7–11, 1997.

    Google Scholar 

  12. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS), 21(4):703–746, July 1999.

    Google Scholar 

  13. J. Harper, D. Kerbyson, and G. Nudd. Predicting the cache miss ratio of loop-nested array references. Technical Report CS-RR-336, Department of Computer Science, University of Warwick, Coventry, UK, December 1997.

    Google Scholar 

  14. J. Harper, D. Kerbyson, and G. Nudd. Analytical modeling of set-associative cache behavior. IEEE Transactions on Computers, 48(10):1009–1023, October 1999.

    Google Scholar 

  15. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, 2nd ed. Morgan Kaufmann, San Mateo, California, 1996.

    Google Scholar 

  16. M. D. Hill and A. J. Smith. Evaluating associativity in CPU caches. IEEE Transactions on Computers, 38(12):1612–1630, December 1989.

    Google Scholar 

  17. C.-H. Hsu and U. Kremer. IPERF: A framework for automatic construction of performance prediction models. In Workshop on Profile and Feedback-Directed Compilation (PFDC), Paris, France, October 1998.

  18. C.-H. Hsu and U. Kremer. Tile selection algorithms and their performance models. Technical Report DCS-TR-401, Department of Computer Science, Rutgers University, October 1999.

  19. I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling and shackling for memory hierarchy management. In Proceedings of the 13th International Conference on Supercomputing (ICS-99), 1999.

  20. F. Kuehndel. Software methods for avoiding cache conflicts. Technical Report CS-TR-98-16, University of Texas, Austin, September 1, 1998.

    Google Scholar 

  21. M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In 4th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 63–74, Santa Clara, Calif., April 1991.

  22. N. Manjikian and T. Abdelrahman. Array data layout for the reduction of cache conflicts. In Proceedings of the 8th International Conference on Parallel and Distributed Computing Systems, 1995.

  23. N. Manjikian and T. Abdelrahman. Array data layout for the reduction of cache conflicts in loop nests. In Vincent Van Dongen, ed., Proceedings of the High Performance Computing Symposium' 95, Canada's Ninth Annual International High Performance Computing Conference and Exhibition, July 1995.

  24. K. McKinley and O. Temam. A quantitative analysis of loop nest locality. In Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 94–104, ACM Press, Cambridge, Massachusetts, October 1996.

    Google Scholar 

  25. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.

    Google Scholar 

  26. N. Mitchell, K. Hogstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6), December 1998.

  27. S. Moon and R. Saavedra. Hyperblocking: A data reorganization method to eliminate cache conflicts in tiled loop nests. Technical Report TR-98-671, Computer Science Department, University of Southern California, February 1998.

  28. P. Panda, H. Nakamura, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, 48(2), February 1999.

  29. W. Pugh. Counting solutions to presburger formulas: How and why. In ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, 94.

  30. G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict missess. In ACM SIGPLAN' 98 Conference on Programming Language Design and Implementation, pp. 38–49, Montreal, Canada, June 1998.

  31. G. Rivera and C.-W. Tseng. Eliminating conflict misses for high performance architectures. In 1998 ACM International Conference on Supercomputing, pp. 353–360, ACM press, New York, July 13–17, 1998.

    Google Scholar 

  32. G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In 8th International Conference on Compiler Construction (CC'99), Amsterdam, The Netherlands, March 1999.

  33. G. Rivera and C.-W. Tseng. Locality optimizations for multi-level caches. In Supercomputing' 99, November 1999.

  34. Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In ACM SIGPLAN' 99 Conference on Programming Language Design and Implementation, pp. 215–228, May 1999.

  35. O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the Sigmetrics Conference on Measurement and Modeling of Computer Systems, pp. 261–271, ACM Press, New York, NY, USA, May 1994.

    Google Scholar 

  36. M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN' 91 Conference on Programming Language Design and Implementation, pp. 30–44, Toronto, Ont., June 1991.

  37. M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In The 29th Annual International Symposium on Microarchitecture, pp. 274–286, December 2–4, 1996.

  38. M. Wolfe. Iteration space tiling for memory hierarchies. In Gary Rodrigue, ed., The 3rd Conference on Parallel Processing for Scientific Computing, pp. 357–361, December 1989.

  39. M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Co., 1996.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hsu, Ch., Kremer, U. A Quantitative Analysis of Tile Size Selection Algorithms. The Journal of Supercomputing 27, 279–294 (2004). https://doi.org/10.1023/B:SUPE.0000011388.54204.8e

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:SUPE.0000011388.54204.8e

Navigation