- 1.D. F. Bacon, S. L. Graham, and O. J. Sharp. Compiler transformations for high-performance computing. ACM Computing Surveys, 26(4):345-420, Dec. 1994. Google ScholarDigital Library
- 2.R. Berrendorf and H. Ziegler. PCL: The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors (Version 1.2), 1998/99. FZJ-ZAM-IB-9816, Available at http://www.fz-juelich.de/zam/PCL/.Google Scholar
- 3.S. Chatterjee and S. Sen. Cache-efficient matrix transposition. In Proceedings of the Sixth IEEE International Symposium on High-Performance Computer Architecture, pages 195-205, 2000.Google Scholar
- 4.S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. ACM SIGPLAN Notices, 30(6):279-290, June 1995. Google ScholarDigital Library
- 5.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, Oct. 1988. Google ScholarDigital Library
- 6.S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems, 21(4):703-746, Nov. 1999. Google ScholarDigital Library
- 7.M. Kandemir, J.Ramanujam, and A. Choudhary. Improving cache locality by a combination of loop and data transformations. IEEE Transactions on Computers, 48(2), 1999. Google ScholarDigital Library
- 8.I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling and shackling for memory hierarchy management. In Proceedings of the ACM Int. Conference on Supercomputing, pages 482-490, 1999. Google ScholarDigital Library
- 9.C. Leopold. Arranging statements and data of program instances for locality. Future Generation Computer Systems, 14:293-311, 1998. Google ScholarDigital Library
- 10.C. Leopold. Generating structured program instances with a high degree of locality. In Proceedings of the 8th Euromicro Workshop on Parallel and Distributed Processing, pages 267-274. IEEE Computer Society Press, 2000. Google ScholarDigital Library
- 11.K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996. Google ScholarDigital Library
- 12.S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann Publishers, 1997. Google ScholarDigital Library
- 13.N. Mukhopadhyay. On the Effectiveness of Feedback-Guided Parallelization. PhD thesis, University of Manchester, 1999.Google Scholar
- 14.G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proceedings of the Int. Conference on Compiler Construction, pages 168-182. Springer LNCS 1575, 1999. Google ScholarDigital Library
- 15.G. Rivera and C.-W. Tseng. Locality optimizations for multi-level caches. In SC'99, 1999. Available at http://w3.csc.ucm.es/Otros/sc99/techpap.htm. Google ScholarDigital Library
- 16.O. Temam, E. D. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proceedings IEEE Supercomputing'93. IEEE Computer Society Press, 1993. Google ScholarDigital Library
- 17.M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. SIGPLAN Notices, 26(6):30-44, June 1991. Google ScholarDigital Library
- 18.M. J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996. Google ScholarDigital Library
Index Terms
- Exploiting non-uniform reuse for cache optimization
Recommendations
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitectureOn-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Modeling LRU cache with invalidation
Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniquesOptimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that ...
Comments