skip to main content
10.1145/2966986.2966995guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops

Published:07 November 2016Publication History

ABSTRACT

Iterative Stencil Loops (ISLs) are a specific class of algorithms of great importance for their substantial presence in a lot of industrial and scientific computing applications, such as in numerical methods for solving partial differential equation - e.g. reverse time migration and heat distribution simulation - or in cellular automata - used for instance for random number generation and error correction. In this work, we propose a hardware acceleration methodology based on the polyhedral model and implement the related framework to automatically accelerate ISLs on a multi-FPGA system. The experimental evaluation shows that the throughput obtained by our solution scales linearly with the amount of resources used on the FPGAs, the power efficiency increases proportionally to the amount of instantiated computation, and outperforms the power efficiency figure of state of the art ISL implementations running on an Intel Xeon CPU by at most 10×. A key aspect of this approach is also that no knowledge of the underlying architecture is requested to the application designer, as no code refactoring is needed to make the application suitable to be processed by our framework.

References

  1. [1].Bandishti V., Pananilath I. and Bondhugula U.. Tiling stencil computations to maximize parallelism. In High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pages 111, Nov 2012.Google ScholarGoogle Scholar
  2. [2].Bandishti V., Pananilath I. and Bondhugula U.. Tiling stencil computations to maximize parallelism. In Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, pages 111, Washington, DC, USA, 2012. IEEE Computer SocietyGoogle ScholarGoogle Scholar
  3. [3].Bondhugula U., Hartono A., Ramanujam J. and Sadayappan P.. A practical automatic polyhedral parallelizer and locality optimizer. SIGPLAN Not., 43 (6): 101113, June 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4].Chen Y.-T., Cong J. and Xiao B.. Aracompiler: a prototyping flow and evaluation framework for accelerator-rich architectures. In Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, pages 157158, March 2015.Google ScholarGoogle Scholar
  5. [5].Cong J., Li P., Xiao B. and Zhang P.. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers. In Proceedings of the 51 st Annual Design Automation Conference, DAC '14, pages 77:177:6, New York, NY, USA, 2014. ACM.Google ScholarGoogle Scholar
  6. [6].Derrien S., Rajopadhye S., Quinton P. and Risset T.. High-level synthesis of loops using the polyhedral model. In High-level synthesis, pages 215230. Springer 2008.Google ScholarGoogle Scholar
  7. [7].Holewinski J., Pouchet L.-N. and Sadayappan P.. High-performance code generation for stencil computations on gpu architectures. In Proceedings of the 26th ACM International Conference on Supercomputing, ICS '12, pages 311320, New York, NY, USA, 2012. ACM.Google ScholarGoogle Scholar
  8. [8].Knodel O., Georgi A., Lehmann P., Nagel W. and Spallek R.. Integration of a highly scalable, multi-FPGA-based hardware accelerator in common cluster infrastructures. In 2013 42nd International Conference on Parallel Processing. Institute of Electrical & Electronics Engineers (IEEE), oct 2013.Google ScholarGoogle Scholar
  9. [9].Li Z. and Song Y.. Automatic tiling of iterative stencil loops. ACM Transactions on Programming Languages and Systems, 26 (6): 9751028, nov 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10].Nacci A. A., Rana V., Bruschi F., Sciuto D., Beretta I. and Atienza D.. A high-level synthesis flow for the implementation of iterative stencil loop algorithms on FPGA devices. In Proceedings of the 50th Annual Design Automation Conference on-DAC '13. Association for Computing Machinery (ACM), 2013.Google ScholarGoogle Scholar
  11. [11].Pouchet L.-N., Zhang P., Sadayappan P. and Cong J., Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 2938, New York, NY, USA, 2013. ACM.Google ScholarGoogle Scholar
  12. [12].Sano K., Hatsuda Y. and Yamamoto S.. Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM '11, pages 234241, Washington, DC, USA, 2011. IEEE Computer SocietyGoogle ScholarGoogle Scholar
  13. [13].Taylor M.. Is dark silicon useful? harnessing the four horsemen of the coming dark silicon apocalypse. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE, pages 11311136, June 2012.Google ScholarGoogle Scholar
  14. [14].Wu Q., Ha Y., Kumar A., Luo S., Li A. and Mohamed S.. A heterogeneous platform with gpu and fpga for power efficient high performance computing. In Integrated Circuits (ISIC), 2014 14th International Symposium on, pages 220223. IEEE, 2014.Google ScholarGoogle Scholar
  15. [15].Zuo W., Li P., Chen D., Pouchet L.-N., Zhong S. and Cong J.. Improving polyhedral code generation for high-level synthesis. In Proceedings of the Ninth IEEE/ACM/IFIP InternationalConference on Hardware/Software Codesign and System Synthesis, CODES+ISSS '13, pages 15:115:10, Piscataway, NJ, USA, 2013. IEEE Press.Google ScholarGoogle Scholar
  16. [16].Zuo W., Liang Y., Li P., Rupnow K., Chen D. and Cong J.. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 918, New York, NY, USA, 2013. ACM.Google ScholarGoogle Scholar

Index Terms

  1. A polyhedral model-based framework for dataflow implementation on FPGA devices of Iterative Stencil Loops
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image Guide Proceedings
      2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
      Nov 2016
      946 pages

      Copyright © 2016

      Publisher

      IEEE Press

      Publication History

      • Published: 7 November 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article