poster

DMATiler: revisiting loop tiling for direct memory access

Authors:

Haibo Lin,

Tao Liu,

Huoding Li,

Tong Chen,

Lakshminarayanan Renganarayana,

John Kevin O'Brien,

Ling ShaoAuthors Info & Claims

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Pages 559 - 560

https://doi.org/10.1145/1854273.1854351

Published: 11 September 2010 Publication History

Get Access

Abstract

In this paper we present the design and implementation of a DMATiler which combines compiler analysis and runtime management to optimize local memory performance. In traditional cache model based loop tiling optimizations, the compiler approximates runtime cache misses as the number of distinct cache lines touched by a loop nest. In contrast, the DMATiler has the full control of the addresses, sizes, and sequences of data transfers. DMATiler uses a simplified DMA performance model to formulate the cost model for DMA-tiled loop nests, then solves it using a custom gradient descent algorithm with heuristics guided by DMA characteristics. Given a loop nest, DMATiler uses loop interchange to make the loop order more friendlier for data movements. Moreover, DMATiler applies compressed data buffer and advanced DMA command to further optimize data transfers. We have implemented the DMATiler in the IBM XL C/C++ for Multi-core Acceleration for Linux, and have conducted experiments with a set of loop nest benchmarks. The results show DMATiler is much more efficient than software controlled cache (average speedup of 9.8x) and single level loop blocking (average speedup of 6.2x) on the Cell BE processor.

References

[1]

]]V. Sarkar and N. Megiddo, "An analytical model for loop tiling and its solution," in ISPASS'00, 2000.

Digital Library

Google Scholar

[2]

]]M. Wolf and M. Lam, "A loop transformation theory and an algorithm to maximize parallelism," vol. 2, no. 4, pp. 452--471, 1991.

Digital Library

Google Scholar

[3]

]]M. Wolfe, High Performance Compilers for Parallel Computing. Addison Wesley, 1996.

Digital Library

Google Scholar

[4]

]]T. Liu, H. Lin, T. Chen, J. K. O'Brien, and L. Shao, "DBDB: Optimizing DMA transfer for the cell be architecture," in ICS'09, 2009.

Digital Library

Google Scholar

[5]

]]S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press. (Online version available at: http://www.stanford.edu/~boyd/cvxbook.html), 2004.

Digital Library

Google Scholar

[6]

]]A. E. Eichenberger, J. K. O'Brien, K. M. O'Brien, P. Wu, and et al, "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture," IBM System. Journal, vol. 45, no. 1, pp. 59--84, 2006.

Digital Library

Google Scholar

Cited By

View all

Singer AWang KEgger BLee D(2023)Tiling for DMA-Based Hardware Accelerators (WIP)Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596283(138-142)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596283
Şuşu A(2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3406536
Qiu KNi YZhang WWang JWu XXue CLi T(2016)An adaptive Non-Uniform Loop Tiling for DMA-based bulk data transfers on many-core processor2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753255(9-16)Online publication date: Oct-2016
https://doi.org/10.1109/ICCD.2016.7753255
Show More Cited By

Index Terms

DMATiler: revisiting loop tiling for direct memory access
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Time skewing made simple
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors, but it is little used in practice because efficient implementations ...
A Case Study of Implementing Supernode Transformations

Supernode transformation is a technique to decrease the communication overhead by partitioning and scheduling a loop nest to a multi-processor system. This is achieved by grouping a number of iterations in a perfectly nested loop with regular ...
DBDB: optimizing DMATransfer for the cell be architecture
ICS '09: Proceedings of the 23rd international conference on Supercomputing

In heterogeneous multi-core systems, such as the Cell BE or certain embedded systems, the accelerator core has its own fast local memory without hardware supported coherence. It is software's responsibility to dynamically transfer the working set when ...

Comments

Information & Contributors

Information

Published In

PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques

September 2010

596 pages

ISBN:9781450301787

DOI:10.1145/1854273

General Chair:
Valentina Salapura
IBM TJ Watson Research Center
,
Program Chairs:
Michael Gschwind
IBM Systems & Technology Group
,
Jens Knoop
Technische Universität Wien

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

PACT '10

Sponsor:

IFIP WG 10.3
IEEE CS TCPP
SIGARCH
IEEE CS TCAA

PACT '10: International Conference on Parallel Architectures and Compilation Techniques

September 11 - 15, 2010

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
241
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Singer AWang KEgger BLee D(2023)Tiling for DMA-Based Hardware Accelerators (WIP)Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596283(138-142)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596283
Şuşu A(2020)A Vector-Length Agnostic Compiler for the Connex-S Accelerator with Scratchpad MemoryACM Transactions on Embedded Computing Systems10.1145/340653619:6(1-30)Online publication date: 3-Oct-2020
https://dl.acm.org/doi/10.1145/3406536
Qiu KNi YZhang WWang JWu XXue CLi T(2016)An adaptive Non-Uniform Loop Tiling for DMA-based bulk data transfers on many-core processor2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753255(9-16)Online publication date: Oct-2016
https://doi.org/10.1109/ICCD.2016.7753255
Srinivas JDing WKandemir MOlukotun KSmith AHundt RMars J(2015)Reactive tilingProceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization10.5555/2738600.2738612(91-102)Online publication date: 7-Feb-2015
https://dl.acm.org/doi/10.5555/2738600.2738612
Srinivas JDing WKandemir M(2015)Reactive tiling2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2015.7054190(91-102)Online publication date: Feb-2015
https://doi.org/10.1109/CGO.2015.7054190

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Time skewing made simple

A Case Study of Implementing Supernode Transformations

DBDB: optimizing DMATransfer for the cell be architecture

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations