Abstract
Loop tiling (or loop blocking) is a well-known loop transformation to improve temporal locality in nested loops which perform matrix computations. When targeting caches that have low associativities, one of the key challenges for loop tiling is to simultaneously minimize capacity misses and conflict misses. This paper analyzes the effect of the tile size and the array-dimension size on capacity misses and conflict misses. The analysis supports the approach of combining tile-size selection (to minimize capacity misses) with array padding (to minimize conflict misses).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Jingling Xue. Loop Tiling for Parallelism. Kluwer Academic Publishers, 2000.
Boulet P, Dongarra J, Robert Y et al. Static tiling for heterogeneous computing platforms. Parallel Computing, 1999, 25(5): 547–568.
Lam M S, Rothberg E E, Wolf M E. The cache performance and optimizations of blocked algorithms. In Proc. the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA, April 1991, pp.63–74.
Chame J, Moon S. A tile selection algorithm for data locality and cache interference. In Proc. the Thirteenth ACM International Conference on Supercomputing, Rhodes, Greece, June 1999, pp.492–499.
Coleman S, McKinley K S. Tile size selection using cache organization and data layout. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995, pp.279–290.
Panda P, Nakamura H, Dutt N et al. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, February 1999, 48(2): 142–149.
Rivera G, Tseng C W. A comparison of compiler tiling algorithms. In Proc. the Eighth International Conference on Compiler Construction, Amsterdam, The Netherlands, March 1999, pp.168–182.
Hong J W, Kung H. I/O complexity: The red-blue pebble game. In Proc. the Thirteenth Annual ACM Symposium on Theory of Computing, Milwaukee, Wisconsin, May 1981, pp.326–333.
Song Y H, Li Z Y. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999, pp.215–228.
Bacon D, Chow J H, Ju D \it et al. \rm A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proc. CASCON'94, Toronto, Ontario, October, 1994, pp.270–282.
Li Z Y, Song Y H. Automatic tiling of iterative stencil loops. ACM Trans. Programming Languages and Systems, November 2004, 26(6): 975–1028.
Object-Oriented Scientific Computing. http://www.oonumerics.org/blitz/benchmarks/. Blitz++.
Admas J C. MUDPACK: Multigrid software for elliptic partial differential equations. http://www.scd.ucar.edu/css/software/mudpack/.
Ghosh S, Martonosi M, Malik S. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proc. the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1998, pp.228–239.
Rivera G, Tseng C W. Tiling optimizations for 3D scientific computations. In Proc. IEEE/ACM SC 2000, November 2000.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is sponsored in part by National Science Foundation of USA under Grant Nos. ST-HEC-0444285, CCR-950254, ACI/ITR-0082834 and CCR-9975309, by Indiana 21st Century Fund, and by a donation from Sun Microsystems, Inc.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Li, Z. Simultaneous Minimization of Capacity and Conflict Misses. J Comput Sci Technol 22, 497–504 (2007). https://doi.org/10.1007/s11390-007-9069-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9069-8