research-article

A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops

Authors:
Giuseppe Natale

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

,
Giulio Stramondo

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

,
Pietro Bressana

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

,
Riccardo Cattaneo

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

,
Donatella Sciuto

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

,
Marco D. Santambrogio

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia

DEIB, Politecnico di Milano, Via Ponzio 34/5, Italia
View Profile

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)Nov 2016Pages 1–8https://doi.org/10.1145/2966986.2966995

Published:07 November 2016Publication History

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1–8

ABSTRACT

Iterative Stencil Loops (ISLs) are a specific class of algorithms of great importance for their substantial presence in a lot of industrial and scientific computing applications, such as in numerical methods for solving partial differential equation - e.g. reverse time migration and heat distribution simulation - or in cellular automata - used for instance for random number generation and error correction. In this work, we propose a hardware acceleration methodology based on the polyhedral model and implement the related framework to automatically accelerate ISLs on a multi-FPGA system. The experimental evaluation shows that the throughput obtained by our solution scales linearly with the amount of resources used on the FPGAs, the power efficiency increases proportionally to the amount of instantiated computation, and outperforms the power efficiency figure of state of the art ISL implementations running on an Intel Xeon CPU by at most 10×. A key aspect of this approach is also that no knowledge of the underlying architecture is requested to the application designer, as no code refactoring is needed to make the application suitable to be processed by our framework.

References

[1].Bandishti V., Pananilath I. and Bondhugula U.. Tiling stencil computations to maximize parallelism. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 1–11, Nov 2012.Google Scholar
[2].Bandishti V., Pananilath I. and Bondhugula U.. Tiling stencil computations to maximize parallelism. In Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, pages 1–11, Washington, DC, USA, 2012. IEEE Computer SocietyGoogle Scholar
[3].Bondhugula U., Hartono A., Ramanujam J. and Sadayappan P.. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43 (6): 101–113, June 2008.Google ScholarDigital Library
[4].Chen Y.-T., Cong J. and Xiao B.. Aracompiler: a prototyping flow and evaluation framework for accelerator-rich architectures. In Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, pages 157–158, March 2015.Google Scholar
[5].Cong J., Li P., Xiao B. and Zhang P.. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers. In Proceedings of the 51 st Annual Design Automation Conference, DAC '14, pages 77:1–77:6, New York, NY, USA, 2014. ACM.Google Scholar
[6].Derrien S., Rajopadhye S., Quinton P. and Risset T.. High-level synthesis of loops using the polyhedral model. In High-level synthesis, pages 215–230. Springer 2008.Google Scholar
[7].Holewinski J., Pouchet L.-N. and Sadayappan P.. High-performance code generation for stencil computations on gpu architectures. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 311–320, New York, NY, USA, 2012. ACM.Google Scholar
[8].Knodel O., Georgi A., Lehmann P., Nagel W. and Spallek R.. Integration of a highly scalable, multi-FPGA-based hardware accelerator in common cluster infrastructures. In 2013 42nd International Conference on Parallel Processing. Institute of Electrical & Electronics Engineers (IEEE), oct 2013.Google Scholar
[9].Li Z. and Song Y.. Automatic tiling of iterative stencil loops. ACM Transactions on Programming Languages and Systems, 26 (6): 975–1028, nov 2004.Google ScholarDigital Library
[10].Nacci A. A., Rana V., Bruschi F., Sciuto D., Beretta I. and Atienza D.. A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices. In Proceedings of the 50th Annual Design Automation Conference on-DAC '13. Association for Computing Machinery (ACM), 2013.Google Scholar
[11].Pouchet L.-N., Zhang P., Sadayappan P. and Cong J., Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 29–38, New York, NY, USA, 2013. ACM.Google Scholar
[12].Sano K., Hatsuda Y. and Yamamoto S.. Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM '11, pages 234–241, Washington, DC, USA, 2011. IEEE Computer SocietyGoogle Scholar
[13].Taylor M.. Is dark silicon useful? harnessing the four horsemen of the coming dark silicon apocalypse. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 1131–1136, June 2012.Google Scholar
[14].Wu Q., Ha Y., Kumar A., Luo S., Li A. and Mohamed S.. A heterogeneous platform with gpu and fpga for power efficient high performance computing. In Integrated Circuits (ISIC), 2014 14th International Symposium on, pages 220–223. IEEE, 2014.Google Scholar
[15].Zuo W., Li P., Chen D., Pouchet L.-N., Zhong S. and Cong J.. Improving polyhedral code generation for high-level synthesis. In Proceedings of the Ninth IEEE/ACM/IFIP InternationalConference on Hardware/Software Codesign and System Synthesis, CODES+ISSS '13, pages 15:1–15:10, Piscataway, NJ, USA, 2013. IEEE Press.Google Scholar
[16].Zuo W., Liang Y., Li P., Rupnow K., Chen D. and Cong J.. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 9–18, New York, NY, USA, 2013. ACM.Google Scholar

Index Terms

A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops
1. Software and its engineering
  1. Software notations and tools

Index terms have been assigned to the content through auto-classification.

Recommendations

SODA: Stencil with Optimized Dataflow Architecture
2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial differential equations, and cellular automata. Many of the stencil kernels are complex, usually consist of multiple stages or ...
Read More
On How to Accelerate Iterative Stencil Loops: A Scalable Streaming-Based Approach

In high-performance systems, stencil computations play a crucial role as they appear in a variety of different fields of application, ranging from partial differential equation solving, to computer simulation of particles’ interaction, to image ...
Read More
An Asynchronous Dataflow FPGA Architecture

We discuss the design of a high-performance field programmable gate array (FPGA) architecture that efficiently prototypes asynchronous (clockless) logic. In this FPGA architecture, low-level application logic is described using asynchronous dataflow ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Nov 2016
946 pages

Copyright © 2016
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 7 November 2016
Permissions
Request permissions about this article.
Request Permissions
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 217
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

SODA: Stencil with Optimized Dataflow Architecture

On How to Accelerate Iterative Stencil Loops: A Scalable Streaming-Based Approach

An Asynchronous Dataflow FPGA Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

SODA: Stencil with Optimized Dataflow Architecture

On How to Accelerate Iterative Stencil Loops: A Scalable Streaming-Based Approach

An Asynchronous Dataflow FPGA Architecture

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media