A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion

Qasem, Apan; Kennedy, Ken

doi:10.1007/978-3-540-69330-7_8

Apan Qasem²⁰ &
Ken Kennedy²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

516 Accesses
6 Citations

Abstract

Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of increased register pressure and cache conflict misses. The complex interaction between different levels of the memory hierarchy with the input program makes it very difficult to always make the right choice in fusing loops. In this paper, we present a cache-conscious analytical model for profitable loop fusion to be used with a constrained weighted fusion algorithm. We then extend the model to show its effectiveness in the context of an empirical tuning framework. A preliminary evaluation of the model is presented using hand experiments on four applications.

This material is based on work supported by the Department of Energy under Contract Nos. 03891-001-99-4G, 74837-001-03 49, 86192-001-04 49, and 12783-001-05 49 from the Los Alamos National Laboratory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Carr, S.: Memory-Hierarchy Management. PhD thesis, Dept. of Computer Science, Rice University (September 1992)
Google Scholar
Darte, A.: On the complexity of loop fusion. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662. Springer, Heidelberg (1999)
Google Scholar
Ding, C., Kennedy, K.: Resource-constrained loop fusion. Technical report, Dept. of Computer Science, Rice University (October 2000)
Google Scholar
Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distributed Processing Symposium, San Francisco, CA (Best Paper Award) (April 2001)
Google Scholar
Gao, G., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT (August 1992)
Google Scholar
Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12) (1989)
Google Scholar
Kennedy, K.: Fast greedy weighted fusion. In: ICS 2000: Proceedings of the 14th international conference on Supercomputing (2000)
Google Scholar
Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)
Google Scholar
Lim, A., Lam, M.: Cache optimizations with affine partitioning. In: Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia (March 2001)
Google Scholar
McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACMTransactions on Programming Languages and Systems 18(4), 424–453 (1996)
Article Google Scholar
Qasem, A., Kennedy, K.: Evaluating a model for cache conflict miss prediction. Technical report, Dept. of Computer Science, Rice University (October 2005)
Google Scholar
Qasem, A., Kennedy, K., Mellor-Crummey, J.: Automatic tuning of whole applications using direct search and a performance-based transformation system. In: Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM (October 2004)
Google Scholar
Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy (June 2001)
Google Scholar
Verdoolaege, S., Bruynooghe, M., Jenssens, G., Catthoor, F.: Multi-dimensional incremental loop fusion for data locality. In: Proceedings of the IEEE International Conference on Application Specific Systems, Architectures, and Processors (June 2003)
Google Scholar
Wolf, M.E., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada (June 1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rice University, Houston, TX
Apan Qasem & Ken Kennedy

Authors

Apan Qasem
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

BSC-UPC,
Eduard Ayguadé
Department of Computer Science, Louisiana State University, 70803, Baton Rouge, LA, USA
Gerald Baumgartner
Dept. of Electrical and Computer Engg., Louisiana State University, Baton Rouge, LA, USA
J. Ramanujam
Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, 43210, Columbus, OH, USA
P. Sadayappan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qasem, A., Kennedy, K. (2006). A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-69330-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69329-1
Online ISBN: 978-3-540-69330-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics