Improving whole-program locality using intra-procedural and inter-procedural transformations☆,☆☆
Section snippets
Mahmut Kandemir is an associate professor in the Computer Science and Engineering Department at the Pennsylvania State University. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He received the B.Sc. and M.Sc. degrees in control and computer engineering from Istanbul Technical University, Istanbul,Turkey, in 1988 and 1992, respectively. He received the Ph.D. from Syracuse University, Syracuse, New York in electrical engineering and
References (62)
- S.P. Amarasinghe, J.M. Anderson, M.S. Lam, C.W. Tseng, The SUIF compiler for scalable parallel machines, in:...
- J. Anderson, Automatic computation and data decomposition for multiprocessors, Ph.D. dissertation, Stanford University,...
- J. Anderson, S. Amarasinghe, M. Lam, Data and computation transformations for multiprocessors, in: Proceedings of the...
- A. Bik, P. Knijnenburg, H. Wijshoff, Reshaping access patterns for generating sparse codes, in: Proceedings of the 7th...
- A. Bik, H. Wijshoff, On a completion method for unimodular matrices, Technical Report 94–14, Department of Computer...
- et al.
Data-distribution support on distributed-shared memory multi-processors
- et al.
Optimal evaluation of array expressions on massively parallel machines
ACM Trans. Programming Languages Systems
(January 1995) - M. Cierniak, W. Li, Unifying data and control transformations for distributed shared memory machines, in: Proceedings...
- M. Cierniak, W. Li, Inter-procedural array re-mapping, in: Proceedings of the International Conference on Parallel...
- S. Coleman, K. McKinley, Tile size selection using cache organization and data layout, in: Proceedings of the SIGPLAN...
A methodology for procedure cloning
Comput. Languages
Loop fusion for memory space optimization
A novel approach towards automatic data distribution
Dynamic data distribution with control flow analysis
Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers
IEEE Trans. Parallel Distri. Systems
A compiler technique for improving whole program locality
An integer linear programming approach for optimizing cache locality
A matrix-based approach to the global locality optimization problem
Improving locality using loop and data transformations in an integrated framework
A framework for inter-procedural locality optimization
Locality optimization algorithms for compilation of out-of-core codes
J. Inform. Sci. Engrg.
Data relation vectorsa new abstraction for data optimizations
A compiler algorithm for optimizing locality in loop nests
Automatic data layout for high performance Fortran
Cited by (4)
Data layout optimization for portable performance
2015, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Data locality optimization of interference graphs based on polyhedral computations
2012, Journal of SupercomputingProgram locality analysis using reuse distance
2009, ACM Transactions on Programming Languages and SystemsBuild to order linear algebra kernels
2008, IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM
Mahmut Kandemir is an associate professor in the Computer Science and Engineering Department at the Pennsylvania State University. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He received the B.Sc. and M.Sc. degrees in control and computer engineering from Istanbul Technical University, Istanbul,Turkey, in 1988 and 1992, respectively. He received the Ph.D. from Syracuse University, Syracuse, New York in electrical engineering and computer science, in 1999. He is a member of the IEEE and the ACM.
- ☆
This work was funded (in part) by the NSF grant CCR-0093082 and the Pittsburgh Digital Greenhouse through a grant from the Commonwealth of Pennsylvania, Department of Community and Economic Development.
- ☆☆
A preliminary version of this paper appears in the 28th Annual ACM Symposium on Principles of Programming Languages [21]. This submission improves upon the POPL paper by (1) presenting a discussion of data transformations; (2) presenting statistics on relative loop nest execution frequencies; (3) discussing the shortcomings of the inter-procedural analysis; (4) discussing the relative merits of selective cloning in this context; (5) presenting experimental results on the different variants of our approach; and (6) presenting experimental data for the multi-processor case.