poster

Time skewing made simple

Authors:

Robert Strzodka,

Mohammed Shaheen,

Dawid PajakAuthors Info & Claims

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Pages 295 - 296

https://doi.org/10.1145/1941553.1941596

Published: 12 February 2011 Publication History

Get Access

Abstract

Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors, but it is little used in practice because efficient implementations utilize complicated code and simple or abstract ones show much smaller gains over naive nested loops. We break this dilemma with an essential time skewing scheme that is both compact and fast.

References

[1]

M. M. Baskaran, A. Hartono, S. Tavarageri, T. Henretty, J. Ramanujam, and P. Sadayappan. Parametrized tiling revisited. In Proc. of the International Symposium on Code Generation and Optimization (CGO'10), 2010.

Digital Library

Google Scholar

[2]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43 (6): 101--113, 2008.

Digital Library

Google Scholar

[3]

M. Frigo and V. Strumpen. Cache oblivious stencil computations. In ICS'05: Proceedings of the 19th annual international conference on Supercomputing, pages 361--366. ACM, 2005.

Digital Library

Google Scholar

[4]

M. Frigo and V. Strumpen. The cache complexity of multithreaded cache oblivious algorithms. In SPAA'06: Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures, pages 271--280, New York, NY, USA, 2006. ACM.

Digital Library

Google Scholar

[5]

A. Hartono, M. M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorthy, B. Norris, J. Ramanujam, and P. Sadayappan. Parametric multi-level tiling of imperfectly nested loops. In Proceedings of the 23rd International Conference on Supercomputing, pages 147--157, 2009.

Digital Library

Google Scholar

[6]

S. Kamil, K. Datta, S. Williams, L. Oliker, J. Shalf, and K. Yelick. Implicit and explicit optimizations for stencil computations. In MSPC'06: Proceedings of the 2006 workshop on Memory system performance and correctness, pages 51--60. ACM, 2006.

Digital Library

Google Scholar

[7]

S. Kamil, C. Chan, L. Oliker, J. Shalf, and S. Williams. An auto-tuning framework for parallel multicore stencil computations. In International Parallel & Distributed Processing Symposium (IPDPS), 2010.

Crossref

Google Scholar

[8]

D. Kim, L. Renganarayanan, D. Rostron, S. V. Rajopadhye, and M. M. Strout. Multi-level tiling: M for the price of one. In Proceedings of the ACM/IEEE Conference on Supercomputing, page 51, 2007.

Digital Library

Google Scholar

[9]

L. Liu and Z. Li. Improving parallelism and locality with asynchronous algorithms. In Proceedings ACM symposium on Principles and practice of parallel programming, PPoPP '10, pages 213--222, 2010.

Digital Library

Google Scholar

[10]

R. Strzodka, M. Shaheen, D. Pajak, and H.-P. Seidel. Cache oblivious parallelograms in iterative stencil computations. In ICS'10: Proceedings of the 24th ACM International Conference on Supercomputing, pages 49--59. ACM, 2010.

Digital Library

Google Scholar

[11]

M. Wittmann, G. Hager, and G. Wellein. Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory. In Proc. Workshop on Large-Scale Parallel Processing (LSPP'10) at IPDPS'10, 2010.

Crossref

Google Scholar

[12]

D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In Proceedings of International Parallel and Distributed Processing Symposium, 2000.

Digital Library

Google Scholar

Cited By

View all

Furgailo VIvanov AKhokhlov N(2019)Research of Techniques to Improve the Performance of Explicit Numerical Methods on the CPU2019 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM.2019.00019(79-85)Online publication date: Sep-2019
https://doi.org/10.1109/IVMEM.2019.00019
Dai DZhang WChen Y(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903752:8(439-440)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019037
Firoz JKanewala TZalewski MBarnas MLumsdaine A(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903652:8(441-442)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019036
Show More Cited By

Index Terms

Time skewing made simple
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Cache oblivious parallelograms in iterative stencil computations
ICS '10: Proceedings of the 24th ACM International Conference on Supercomputing

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. We compare execution times for 2D and 3D spatial domains ...
Time skewing made simple
PPoPP '11

Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors, but it is little used in practice because efficient implementations ...
Cache Accurate Time Skewing in Iterative Stencil Computations
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

We present a time skewing algorithm that breaks the memory wall for certain iterative stencil computations. A stencil computation, even with constant weights, is a completely memory-bound algorithm. For example, for a large 3D domain of $500^3$ doubles ...

Comments

Information & Contributors

Information

Published In

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

February 2011

326 pages

ISBN:9781450301190

DOI:10.1145/1941553

General Chair:
Calin Cascaval
Qualcomm Research, USA
,
Program Chair:
Pen-Chung Yew
Academia Sinica, Taiwan and University of Minnesota at Twin Cities, USA

ACM SIGPLAN Notices Volume 46, Issue 8
PPoPP '11
August 2011
300 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2038037
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PPoPP '11

Sponsor:

SIGPLAN

PPoPP '11: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 12 - 16, 2011

TX, San Antonio, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
241
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)4

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Furgailo VIvanov AKhokhlov N(2019)Research of Techniques to Improve the Performance of Explicit Numerical Methods on the CPU2019 Ivannikov Memorial Workshop (IVMEM)10.1109/IVMEM.2019.00019(79-85)Online publication date: Sep-2019
https://doi.org/10.1109/IVMEM.2019.00019
Dai DZhang WChen Y(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903752:8(439-440)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019037
Firoz JKanewala TZalewski MBarnas MLumsdaine A(2017)POSTERACM SIGPLAN Notices10.1145/3155284.301903652:8(441-442)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3019036
Awan AHamidouche KHashmi JPanda D(2017)S-CaffeACM SIGPLAN Notices10.1145/3155284.301876952:8(193-205)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018769
Ren BKrishnamoorthy SAgrawal KKulkarni M(2017)Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel ProgramsACM SIGPLAN Notices10.1145/3155284.301876352:8(117-130)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018763
Acar UBen-David NRainey M(2017)Contention in Structured ConcurrencyACM SIGPLAN Notices10.1145/3155284.301876252:8(75-88)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018762
Basin DBortnikov EBraginsky AGolan-Gueta GHillel EKeidar ISulamy M(2017)KiWiACM SIGPLAN Notices10.1145/3155284.301876152:8(357-369)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018761
Jiang PAgrawal G(2017)Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative SpeculationACM SIGPLAN Notices10.1145/3155284.301876052:8(179-191)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018760
Bättig MGross T(2017)Synchronized-by-Default Concurrency for Shared-Memory SystemsACM SIGPLAN Notices10.1145/3155284.301874752:8(299-312)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018747
Vollmer MScott RMusuvathi MNewton R(2017)SC-HaskellACM SIGPLAN Notices10.1145/3155284.301874652:8(283-298)Online publication date: 26-Jan-2017
https://dl.acm.org/doi/10.1145/3155284.3018746
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Cache oblivious parallelograms in iterative stencil computations