Abstract
We propose techniques for identifying and exploiting spatial and temporal reuse for indirectly indexed arrays. Indirectly indexed arrays are those arrays which are, typically, accessed inside multilevel loop nests and whose index expression includes not only loop iterators and constants but arrays as well. Existing techniques for improving locality are quite sophisticated in the case of directly indexed arrays. But, unfortunately, they are inadequate for handling indirectly indexed arrays. In this article we therefore extend the existing framework and techniques of directly indexed to indirectly indexed arrays. The concepts of reuse subspace, dependence vector, self, and group reuse are extended and applied in this new context. Also, lately scratch-pad memory has become an attractive alternative to data-cache, specially in the embedded multimedia community. This is because embedded systems are very sensitive to area and energy and the scratch-pad is smaller in area and consumes less energy on a per access basis compared to the cache of the same capacity. Several techniques have been proposed in the past for the efficient exploitation of the scratch-pad for directly indexed arrays. We extend these techniques by presenting a method for scratch-pad mapping of indirectly indexed arrays. This enables the scratch-pad to be used in a larger context than was possible before.
- Absar, M. J. and Catthoor, F. 2004. Compiler-based approach for exploiting scratch-pad in presence of irregular array access. In Proceedings of the Conference on Design Automation and Test in Europe (DATE). 1162--1167. Google ScholarDigital Library
- Allen, R. and Kennedy, K. 2001. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Banerjee, U. 1988. Data Dependencies. Kluwer, Dordrecht, The Netherlands.Google Scholar
- Das, R., Mavriplis, D., Saltz, J., and Gupta, S. 1994. Communication optimizations for irregular scientific computation on distributed memory architectures. J. Parallel Distrib. Comput. 22, 3, 464--478. Google ScholarDigital Library
- Ding, C. and Kennedy, K. 1999. Improving cache performance in dynamic applications through data and computation reorganization at run time. In PLDI '99: Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation. ACM Press, New York, NY, 229--241. Google ScholarDigital Library
- Dominguez, A., Udayakumaran, S., and Barua, R. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 120--137. Google ScholarDigital Library
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., and Mendias, J. M. 2004. An integrated hardware/software approach for run time scratchpad management. In Proceedings of DAC. 238--243. Google ScholarDigital Library
- Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformation. J. Parallel Distrib. Comput. 5, 5, 587--616. Google ScholarDigital Library
- Issenin, I., Brockmeyer, E., Miranda, M., and Dutt, N. 2004. Data reuse analysis technique for software-controlled memory hierarchies. In Proceedings of the Conference on Design Automation and Test in Europe (DATE). 202--207. Google ScholarDigital Library
- Kandemir, M., Ramanujam, J., Irwin, J., Vijaykrishnan, N., Kadayif, I., and Parikh, A. 2001. Dynamic management of scratch-pad memory space. In DAC '01: Proceedings of the 38th Conference on Design Automation. ACM Press, New York, NY, 690--695. Google ScholarDigital Library
- Kandemir, M. T. and Ramanujan, J. 2004. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Trans. Comput. Aid. Des. Integrated Circ. Syst. 23, 2 (Mar.), 243--259. Google ScholarDigital Library
- Kandemir, M. T., Ramanujan, J., and Chowdhury, A. 1999. Improving cache locality by a combination of loop and data transformation. IEEE Trans. Comput. 48, 2. Google ScholarDigital Library
- Lam, M. S. 2004. A data locality optimizing algorithm, a retrospective. In 20 Years of PLDI (1979-1999) : A Selection. ACM Press, New York, NY, 30--44.Google Scholar
- Lamport, L. 1974. The parallel execution of do loops. Commun. ACM 17, 2, 83--93. Google ScholarDigital Library
- Lee, C., Potkonjak, M., and Smith, M. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the International Symposium on Microarchitecture. Google ScholarDigital Library
- Lim, A. W., Cheong, G. I., and Lam, M. S. 1999. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS '99: Proceedings of the 13th International Conference on Supercomputing. ACM Press, New York, NY, 228--237. Google ScholarDigital Library
- Marwedel, P. 2003. Embedded System Design. Kluwer, Norwell, MA. Google ScholarDigital Library
- Panda, P. R., Dutt, N. D., and Nicolau, A. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In EDTC '97: Proceedings of the 1997 European Conference on Design and Test. IEEE Computer Society, Washington, DC, 7. Google ScholarDigital Library
- Panda, P. R., Nicolau, A., and Dutt, N. 1998. Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration. Kluwer, Norwell, MA. Google ScholarDigital Library
- Stobach, P. 1998. A new technique in scene adpative coding. In Proceedings of the European Signal Processing Conference (EUSIPCO).Google Scholar
- Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile time composition of run time data and iteration reorderings. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM Press, New York, NY, 91--102. Google ScholarDigital Library
- Todd, C. and Davidson, G. 1994. Ac-3: Flexible perceptual coding for audio transmission and storage. In Proceedings of the 96th Convention of the Audio Engineering Society. 89--102.Google Scholar
- Verma, M., Wehmeyer, L., and Marwedel, P. 2004. Dynamic overlay of scratchpad memory for energy minimization. In Proceedings of the 2nd IEEE/ACM/IFIP Inernational Conference on Hardware/Software Codesign and System Synthesis (CODES'04, Stockholm, Sweden). 104--109. Google ScholarDigital Library
- Wolf, M. E. and Lam, M. S. 1991. A data locality optimizing algorithm. In PLDI '91: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation. ACM Press, New York, NY, 30--44. Google ScholarDigital Library
Index Terms
- Reuse analysis of indirectly indexed arrays
Recommendations
Analyzing data reuse for cache reconfiguration
Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each nest might work best with a different cache configuration and ...
Outer-loop vectorization: revisited for short SIMD architectures
PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniquesVectorization has been an important method of using data-level parallelism to accelerate scientific workloads on vector machines such as Cray for the past three decades. In the last decade it has also proven useful for accelerating multi-media and ...
A Layout-Conscious Iteration Space Transformation Technique
Exploiting locality of references has become extremely important in realizing the potential performance of modern machines with deep memory hierarchies. The data access patterns of programs and the memory layouts of the accessed data sets play a ...
Comments