Abstract
This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.
This work is sponsored in part by National Science Foundation through grants CCR-9975309, CCR-950254, MIP-9610379 and by Purdue Research Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. Bacon, J.-H. Chow, D. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In Proceedings of CASCON’94, Toronto, Ontario, October 1994.
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of 4th International Workshop on Languages and Compilers for Parallel Computing, August 1991. Also in Lecture Notes in Computer Science, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., pp. 328–341, Springer-Verlag, Aug. 1991.
Somnath Ghosh, Margaret Martonosi, and Sharad Malik. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 228–239, San Jose, California, October 1998.
Junjie Gu, Zhiyuan Li, and Gyungho Lee. Experience with efficient array data flow analysis for array privatization. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 157–167, Las Vegas, NV, June 1997.
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multi-level blocking. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, NV, June 1997.
Induprakas Kodukula and Keshav Pingali. Transformations of imperfectly nested loops. In Proceedings of Supercomputing, November 1996.
Naraig Manjikian and Tarek Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193–209, February 1997.
John McCalpin and David Wonnacott. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality. http://www.haverford.edu/cmsc/davew/cache-opt/cache-opt.html.
Nicholas Mitchell, Karin Högstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641–670, December 1998.
Gabriel Rivera and Chau-Wen Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, pages 353–360, Melbourne, Australia, July 1998.
Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215–228, Atlanta, GA, May 1999.
Standard Performance Evaluation Corporation, Vols. 1–9. SPEC Newsletter, 1989–1997.
O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 261–271, Nashville, TN, May 1994.
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada, June 1991.
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pages 274–286, Paris, France, December 1996.
Michael Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Song, Y., Li, Z. (2000). A Compiler Framework for Tiling Imperfectly-Nested Loops. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_12
Download citation
DOI: https://doi.org/10.1007/3-540-44905-1_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67858-8
Online ISBN: 978-3-540-44905-8
eBook Packages: Springer Book Archive