skip to main content
10.1145/1941553.1941596acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Time skewing made simple

Published: 12 February 2011 Publication History

Abstract

Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors, but it is little used in practice because efficient implementations utilize complicated code and simple or abstract ones show much smaller gains over naive nested loops. We break this dilemma with an essential time skewing scheme that is both compact and fast.

References

[1]
M. M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan. Parametrized tiling revisited. In Proc. of the International Symposium on Code Generation and Optimization (CGO'10), 2010.
[2]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43 (6): 101--113, 2008.
[3]
M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS'05: Proceedings of the 19th annual international conference on Supercomputing, pages 361--366. ACM, 2005.
[4]
M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In SPAA'06: Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, pages 271--280, New York, NY, USA, 2006. ACM.
[5]
A. Hartono, M. M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing, pages 147--157, 2009.
[6]
S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In MSPC'06: Proceedings of the 2006 workshop on Memory system performance and correctness, pages 51--60. ACM, 2006.
[7]
S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams. An auto-tuning framework for parallel multicore stencil computations. In International Parallel & Distributed Processing Symposium (IPDPS), 2010.
[8]
D. Kim, L. Renganarayanan, D. Rostron, S. V. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In Proceedings of the ACM/IEEE Conference on Supercomputing, page 51, 2007.
[9]
L. Liu and Z. Li. Improving parallelism and locality with asynchronous algorithms. In Proceedings ACM symposium on Principles and practice of parallel programming, PPoPP '10, pages 213--222, 2010.
[10]
R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache oblivious parallelograms in iterative stencil computations. In ICS'10: Proceedings of the 24th ACM International Conference on Supercomputing, pages 49--59. ACM, 2010.
[11]
M. Wittmann, G. Hager, and G. Wellein. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. In Proc. Workshop on Large-Scale Parallel Processing (LSPP'10) at IPDPS'10, 2010.
[12]
D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proceedings of International Parallel and Distributed Processing Symposium, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
  • General Chair:
  • Calin Cascaval,
  • Program Chair:
  • Pen-Chung Yew
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 8
    PPoPP '11
    August 2011
    300 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2038037
    Issue’s Table of Contents

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bandwidth
  2. data locality
  3. loop tiling
  4. memory bound
  5. memory wall
  6. stencil
  7. temporal blocking
  8. time skewing

Qualifiers

  • Poster

Conference

PPoPP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media