Skip to main content

A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Abstract

Loop fusion is recognized as an effective program transformation for improving memory hierarchy performance. However, unconstrained loop fusion can lead to poor performance because of increased register pressure and cache conflict misses. The complex interaction between different levels of the memory hierarchy with the input program makes it very difficult to always make the right choice in fusing loops. In this paper, we present a cache-conscious analytical model for profitable loop fusion to be used with a constrained weighted fusion algorithm. We then extend the model to show its effectiveness in the context of an empirical tuning framework. A preliminary evaluation of the model is presented using hand experiments on four applications.

This material is based on work supported by the Department of Energy under Contract Nos. 03891-001-99-4G, 74837-001-03 49, 86192-001-04 49, and 12783-001-05 49 from the Los Alamos National Laboratory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Carr, S.: Memory-Hierarchy Management. PhD thesis, Dept. of Computer Science, Rice University (September 1992)

    Google Scholar 

  2. Darte, A.: On the complexity of loop fusion. In: Malyshkin, V.E. (ed.) PaCT 1999. LNCS, vol. 1662. Springer, Heidelberg (1999)

    Google Scholar 

  3. Ding, C., Kennedy, K.: Resource-constrained loop fusion. Technical report, Dept. of Computer Science, Rice University (October 2000)

    Google Scholar 

  4. Ding, C., Kennedy, K.: Improving effective bandwidth through compiler enhancement of global cache reuse. In: International Parallel and Distributed Processing Symposium, San Francisco, CA (Best Paper Award) (April 2001)

    Google Scholar 

  5. Gao, G., Olsen, R., Sarkar, V., Thekkath, R.: Collective loop fusion for array contraction. In: Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT (August 1992)

    Google Scholar 

  6. Hill, M.D., Smith, A.J.: Evaluating associativity in cpu caches. IEEE Trans. Comput. 38(12) (1989)

    Google Scholar 

  7. Kennedy, K.: Fast greedy weighted fusion. In: ICS 2000: Proceedings of the 14th international conference on Supercomputing (2000)

    Google Scholar 

  8. Kennedy, K., McKinley, K.S.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)

    Google Scholar 

  9. Lim, A., Lam, M.: Cache optimizations with affine partitioning. In: Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, Virginia (March 2001)

    Google Scholar 

  10. McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACMTransactions on Programming Languages and Systems 18(4), 424–453 (1996)

    Article  Google Scholar 

  11. Qasem, A., Kennedy, K.: Evaluating a model for cache conflict miss prediction. Technical report, Dept. of Computer Science, Rice University (October 2005)

    Google Scholar 

  12. Qasem, A., Kennedy, K., Mellor-Crummey, J.: Automatic tuning of whole applications using direct search and a performance-based transformation system. In: Proceedings of the Los Alamos Computer Science Institute Second Annual Symposium, Santa Fe, NM (October 2004)

    Google Scholar 

  13. Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In: Proceedings of the 15th ACM International Conference on Supercomputing, Sorrento, Italy (June 2001)

    Google Scholar 

  14. Verdoolaege, S., Bruynooghe, M., Jenssens, G., Catthoor, F.: Multi-dimensional incremental loop fusion for data locality. In: Proceedings of the IEEE International Conference on Application Specific Systems, Architectures, and Processors (June 2003)

    Google Scholar 

  15. Wolf, M.E., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada (June 1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Qasem, A., Kennedy, K. (2006). A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69330-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69329-1

  • Online ISBN: 978-3-540-69330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics