A Compiler Framework for Tiling Imperfectly-Nested Loops

Song, Yonghong; Li, Zhiyuan

doi:10.1007/3-540-44905-1_12

Yonghong Song⁵ &
Zhiyuan Li⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1863))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

346 Accesses
5 Citations

Abstract

This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.

This work is sponsored in part by National Science Foundation through grants CCR-9975309, CCR-950254, MIP-9610379 and by Purdue Research Foundation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

An Analytical Model for Loop Tiling Transformation

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

References

D. Bacon, J.-H. Chow, D. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In Proceedings of CASCON’94, Toronto, Ontario, October 1994.
Google Scholar
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of 4th International Workshop on Languages and Compilers for Parallel Computing, August 1991. Also in Lecture Notes in Computer Science, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., pp. 328–341, Springer-Verlag, Aug. 1991.
Google Scholar
Somnath Ghosh, Margaret Martonosi, and Sharad Malik. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 228–239, San Jose, California, October 1998.
Google Scholar
Junjie Gu, Zhiyuan Li, and Gyungho Lee. Experience with efficient array data flow analysis for array privatization. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 157–167, Las Vegas, NV, June 1997.
Google Scholar
Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multi-level blocking. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, NV, June 1997.
Google Scholar
Induprakas Kodukula and Keshav Pingali. Transformations of imperfectly nested loops. In Proceedings of Supercomputing, November 1996.
Google Scholar
Naraig Manjikian and Tarek Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193–209, February 1997.
Google Scholar
John McCalpin and David Wonnacott. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality. http://www.haverford.edu/cmsc/davew/cache-opt/cache-opt.html.
Nicholas Mitchell, Karin Högstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641–670, December 1998.
Google Scholar
Gabriel Rivera and Chau-Wen Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, pages 353–360, Melbourne, Australia, July 1998.
Google Scholar
Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215–228, Atlanta, GA, May 1999.
Google Scholar
Standard Performance Evaluation Corporation, Vols. 1–9. SPEC Newsletter, 1989–1997.
Google Scholar
O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 261–271, Nashville, TN, May 1994.
Google Scholar
Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada, June 1991.
Google Scholar
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pages 274–286, Paris, France, December 1996.
Google Scholar
Michael Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, Purdue University, West Lafayette, IN 47907
Yonghong Song & Zhiyuan Li

Authors

Yonghong Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Larry Carter & Jeanne Ferrante &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, Y., Li, Z. (2000). A Compiler Framework for Tiling Imperfectly-Nested Loops. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_12

Download citation

DOI: https://doi.org/10.1007/3-540-44905-1_12
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67858-8
Online ISBN: 978-3-540-44905-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Compiler Framework for Tiling Imperfectly-Nested Loops

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

An Analytical Model for Loop Tiling Transformation

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Compiler Framework for Tiling Imperfectly-Nested Loops

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph

An Analytical Model for Loop Tiling Transformation

Parallel Tiled Cache and Energy Efficient Code for Zuker’s RNA Folding

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation