Abstract
This article presents an algorithm to reduce cache conflicts and improve cache localities. The proposed algorithm analyzes locality reference space for each reference pattern, partitions the multi-level cache into several parts with different sizes, and then maps array data onto the scheduled cache positions to eliminate cache conflicts. A greedy method for rearranging array variables in declared statement is also developed, to reduce the memory overhead for mapping arrays onto a partitioned cache. Besides, loop tiling and the proposed schemes are combined to exploit opportunities for both temporal and spatial reuse. Atom is used as a tool to develop a simulation of the behavior of the direct-mapping cache to demonstrate that our approach is effective at reducing number of cache conflicts and exploiting cache localities. Experimental results reveal that applying the cache partitioning scheme can greatly reduce the cache conflicts and thus save program execution time in both single-level cache and multi-level cache hierarchies.
Similar content being viewed by others
References
D. F. Bacon, S. L. Graham, and O. J. Sharp.Compiler transformations for high-performance computing.Technical report UCB/CSD-93-781. Computer Science Division, University of California, Berkeley 1993.
D. F. Bacon, J. H. Chow, D. C. R. Ju, K. Muthukumar and V. Sarkar A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness In CASCON '94 pp 270-282 Toronto, Canada 1994.
F. Chen, T. W. O 'Neil, and E. Sha.Machine architecture optimizing overall loop schedules using prefetching and partitioning. IEEE Transactions on Parallel and Distributed Systems, 11(6):604-614, 2000.
K. Hwang and F. A. Briggs. Computer Architecture and Parallel Processing McGraw-Hill, Inc. 1984.
M. Kandemir, J. Ramanujam, and A. Choudhary. Improving cache locality by a combination of loop and data transformations. IEEE Transactions on Computers, 48(2):159-167, 1999.
M. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance of blocked algorithms. In Proceedings of the Fourth International Conference Architectural Support for Programming Languages and Operating Systems, pp.63-74, 1991.
A. R. Lebeck and D. A. Wood.Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, 27(10):15-26, 1994.
J. H. Lee, M. Y. Lee, S. U. Choi, and M. S. Park. Reducing cache conflicts in data cache prefetching. Computer Architecture News, 22(4):71-77, 1994.
L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested for-loops using index shift method. In Proceedings of the International Conference on Parallel Processing, vol. II, pp. 119-123, 1990.
N. Manjikian and T. S. Abdelrahman. Reduction of cache conflicts in loop nests Technical report CSRI-318 Computer Systems Research Institute, University of Toronto March 1995.
T. Mowry. Tolerating latency through software-controlled data prefetching. Ph.D. dissertation. Dept. of Electrical Engineering, Standford University, 1994.
P. R. Panda, H. Nakamura, N. D. Dutt, and A. Nicolau. Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, 48(2):142-149, 1999.
S. Przybylski, M. Horowitz, and J. L. Hennessy. Performance tradeoffs in cache design. In Proceedings of the 15th Symposium Computer Architecture, pp. 290-298, 1988.
G. Rivera and C. W. Tesig. Data transformations for eliminating conflict misses. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, 1998.
O. Temam, C. Fricker, and W. Jalby. Impact of cache interferences on usual numerical dense loop nests. Proceedings of the IEEE, 81(8):1103-1115, 1993.
M. J. Wolfe. Iteration space tiling for memory hierarchies. In Proceedings of the Third SIAM Conference Parallel Processing for Scientific Computing, pp. 357-361, 1987.
M. E. Wolf and M. S. Lam.A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pp. 30-44, 1991.
D. C. Wong, E. W. Davis, and J. O. Young. A software approach to avoiding spatial cache collisions in parallel processor systems. IEEE Transactions on Parallel and Distributed Systems, 9(6):601-608, 1998.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chang, CY., Sheu, JP. & Chen, HC. Reducing Cache Conflicts by Multi-Level Cache Partitioning and Array Elements Mapping. The Journal of Supercomputing 22, 197–219 (2002). https://doi.org/10.1023/A:1014982819342
Issue Date:
DOI: https://doi.org/10.1023/A:1014982819342