research-article

Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays

Authors:
Michael Witterauf

Friedrich-Alexander-Universität, Erlangen-Nürnberg

Friedrich-Alexander-Universität, Erlangen-Nürnberg
View Profile

,
Frank Hannig

Friedrich-Alexander-Universität, Erlangen-Nürnberg

Friedrich-Alexander-Universität, Erlangen-Nürnberg
View Profile

,
Jürgen Teich

Friedrich-Alexander-Universität, Erlangen-Nürnberg

Friedrich-Alexander-Universität, Erlangen-Nürnberg
View Profile

MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System DesignOctober 2019Article No.: 8Pages 1–10https://doi.org/10.1145/3359986.3361205

Published:09 October 2019Publication History

MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design

Pages 1–10

ABSTRACT

To leverage the vast parallelism of loops, embedded loop accelerators often take the form of processor arrays with many, but simple processing elements. Each processing element executes a subset of a loop's iterations in parallel using instruction- and datalevel parallelism by tightly scheduling iterations using software pipelining and packing instructions into compact, individual programs. However, loop bounds are often unknown until runtime, which complicates the static generation of programs because they influence each program's control flow.

Existing solutions, like generating and storing all possible programs or full just-in-time compilation, are prohibitively expensive, especially in embedded systems. As a remedy, we propose a hybrid approach introducing a tree-like program representation, whose generation front-loads all intractable sub-problems to compile time, and from which all concrete program variants can efficiently be stitched together at runtime. The tree consists of so-called polyhedral fragments that represent concrete program parts and are annotated with iteration-dependent conditions.

We show that both this representation is both space- and time-efficient: it requires polynomial space to store---whereas storing all possibly generated programs is non-polynomial---and polynomial time to evaluate---whereas just-in-time compilation requires solving NP-hard problems. In a case study, we show for a representative loop program that using a tree of polyhedral fragments saves 98.88 % of space compared to storing all program variants.

References

S. Aditya and V. Kathail, High-level synthesis: From algorithm to digital circuit. Dordrecht: Springer Netherlands, 2008, ch. Algorithmic Synthesis Using PICO, pp. 53--74.Google Scholar
D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich, "A dynamically reconfigurable weakly programmable processor array architecture template." in Proceedings of the 2nd International Workshop on Reconfigurable Communication Centric System-on-Chips (ReCoSoC), 2006, pp. 31--37.Google Scholar
F. Hannig, V. Lari, S. Boppu, A. Tanase, and O. Reiche, "Invasive tightly-coupled processor arrays: A domain-specific architecture/compiler co-design approach," ACM Transactions on Embedded Computing Systems (TECS), vol. 13, no. 4s, p. 133, 2014.Google Scholar
M. Brand, F. Hannig, A. Tanase, and J. Teich, "Orthogonal instruction processing: An alternative to lightweight VLIW processors," in IEEE 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), 2017, pp. 5--12.Google Scholar
P. Feautrier, "Dataflow analysis of array and scalar references," International Journal of Parallel Programming, vol. 20, no. 1, pp. 23--53, 1991.Google ScholarDigital Library
S. K. Rao, "Regular iterative algorithms and their implementation on processor arrays," Ph.D. dissertation, Stanford University, 1985.Google Scholar
J. Teich, L. Thiele, and L. Zhang, "Scheduling of partitioned regular algorithms on processor arrays with constrained resources," in 1996 IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 1996, pp. 131--144.Google Scholar
F. Hannig, Scheduling techniques for high-throughput loop accelerators. Munich: Verlag Dr. Hut, 2009.Google Scholar
J. Teich, A compiler for application specific processor arrays. Aachen: Shaker, 1993.Google Scholar
H. W. Nelis and E. F. Deprettere, "Automatic design and partitioning of systolic/wavefront arrays for VLSI," Circuits, Systems and Signal Processing, vol. 7, no. 2, pp. 235--252, 1988.Google ScholarCross Ref
B. R. Rau and C. D. Glaeser, "Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing," in Proceedings of the 14th Annual Workshop on Microprogramming. IEEE, 1981, pp. 183--198.Google ScholarDigital Library
M. Witterauf, A. Tanase, F. Hannig, and J. Teich, "Modulo scheduling of symbolically tiled loops for tightly coupled processor arrays," in 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 2016, pp. 58--66.Google Scholar
S. Boppu, F. Hannig, and J. Teich, "Compact code generation for tightly-coupled processor arrays," Journal of Signal Processing Systems, vol. 77, no. 1-2, pp. 5--29, 2014.Google ScholarDigital Library
S. Boppu, F. Hannig, "Loop program mapping and compact code generation for programmable hardware accelerators," in 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors, June 2013, pp. 10--17.Google Scholar
A. Hartono, M. M. Baskaran, J. Ramanujam, and P. Sadayappan, "DynTile: Parametric tiled loop generation for parallel execution on multicore processors," in IEEE International Symposium on Parallel Distributed Processing (IPDPS), 2010, pp. 1--12.Google Scholar
D. Kim and S. Rajopadhye, "Efficient tiled loop generation: D-tiling," in Languages and compilers for parallel computing (LCPC). Springer, 2009, pp. 293--307.Google Scholar
M. Kong, R. Veras, K. Stock, F. Franchetti, L.-N. Pouchet, and P. Sadayappan, "When polyhedral transformations meet SIMD code generation," in ACM Sigplan Notices, vol. 48, no. 6. ACM, 2013, pp. 127--138.Google Scholar
A. Konstantinidis, P. Kelly, J. Ramanujam, and P. Sadayappan, "Parametric GPU code generation for affine loop programs," in Languages and compilers for parallel computing (LCPC), vol. 8664. Springer, 2014, pp. 136--151.Google Scholar
A. Jimborean, P. Clauss, J.-F. Dollinger, V. Loechner, and J. M. Martinez Caamaño, "Dynamic and speculative polyhedral parallelization using compiler-generated skeletons," International Journal of Parallel Programming, vol. 42, no. 4, pp. 529--545, Aug 2014.Google ScholarDigital Library
J. M. M. Caamaño, W. Wolff, and P. Clauss, "Code bones: Fast and flexible code generation for dynamic and speculative polyhedral optimization," in Proceedings of the 22nd European Conference on Parallel Processing (Euro-Par). Springer, 2016, pp. 225--237.Google Scholar

Recommendations

Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis

High-level synthesis (HLS) allows hardware to be directly produced from behavioral description in C/C++, thus accelerating the design process. Loop pipelining is a key transformation of HLS, as it improves the throughput of the design at the price of a ...
Read More
A practical automatic polyhedral parallelizer and locality optimizer
PLDI '08

We present the design and implementation of an automatic polyhedral source-to-source transformation framework that can optimize regular programs (sequences of possibly imperfectly nested loops) for parallelism and locality simultaneously. Through this ...
Read More
A polyhedral compilation framework for loops with dynamic data-dependent bounds
CC 2018: Proceedings of the 27th International Conference on Compiler Construction

We study the parallelizing compilation and loop nest optimization of an important class of programs where counted loops have a dynamic data-dependent upper bound. Such loops are amenable to a wider set of transformations than general while loops with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design
October 2019
160 pages
ISBN:9781450369978
DOI:10.1145/3359986
General Chairs:
Partha Roop,
Naijun Zhan,
Program Chairs:
Sicun Gao,
Pierluigi Nuzzo
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
MEMOCODE '19 Paper Acceptance Rate12of34submissions,35%Overall Acceptance Rate34of82submissions,41%
More
Upcoming Conference
MEMOCODE '24

Sponsor:

sigbed

sigbed

22nd ACM-IEEE International Conference on Formal Methods and Models for System Design

October 3 - 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 86
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays

MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design

ABSTRACT

References

Cited By

Recommendations

Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis

A practical automatic polyhedral parallelizer and locality optimizer

A polyhedral compilation framework for loops with dynamic data-dependent bounds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Polyhedral fragments: an efficient representation for symbolically generating code for processor arrays

MEMOCODE '19: Proceedings of the 17th ACM-IEEE International Conference on Formal Methods and Models for System Design

ABSTRACT

References

Cited By

Recommendations

Polyhedral Bubble Insertion: A Method to Improve Nested Loop Pipelining for High-Level Synthesis

A practical automatic polyhedral parallelizer and locality optimizer

A polyhedral compilation framework for loops with dynamic data-dependent bounds

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media