research-article

Polyhedral-based data reuse optimization for configurable computing

Authors:

Louis-Noel Pouchet,

Jason CongAuthors Info & Claims

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 29 - 38

https://doi.org/10.1145/2435264.2435273

Published: 11 February 2013 Publication History

Abstract

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.

We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

References

[1]

Center for domain-specific computing. http://cdsc.ucla.edu.

[2]

Convey. http://www.conveycomputer.com.

[3]

http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm.

[4]

Pocc 1.1. http://pocc.sourceforge.net.

[5]

An independent evaluation of the autoesl autopilot high-level synthesis tool. Technical report, Berkeley Design Technology, Inc., 2010.

[6]

N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000.

Digital Library

[7]

C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for fpga. SIGPLAN Not., 47(8):285--286, Feb. 2012.

Digital Library

[8]

J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002.

Digital Library

[9]

M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In ACM Symposium on Principles and practice of parallel programming, pages 1--10. ACM, 2008.

Digital Library

[10]

C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Sept. 2004.

Digital Library

[11]

S. Bayliss and G. A. Constantinides. Optimizing sdram bandwidth for custom fpga loop accelerators. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM.

Digital Library

[12]

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.

Digital Library

[13]

E. Brockmeyer, M. Miranda, and F. Catthoor. Layer assignment techniques for low energy in multi-layered memory organisations. In Design, Automation and Test in Europe Conference and Exhibition, 2003, pages 1070--1075, 2003. DATE.

Digital Library

[14]

F. Catthoor, K. Danckaert, K. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. v. Achteren, and T. Omnes. Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, Norwell, MA, USA, 2002.

Digital Library

[15]

F. Catthoor, E. d. Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998.

Digital Library

[16]

J. Cong, K. Guruaj, M. Huang, S. Li, B. Xiao, and Y. Zou. Domain-specific processor with 3d integration for medical image processing. In IEEE Intl. Conf. on Application-Specific Systems, Architectures and Processors, pages 247--250, sept. 2011.

Digital Library

[17]

J. Cong, M. Huang, and Y. Zou. Accelerating fluid registration algorithm on multi-fpga platforms. In Proc. of Intl. Conf. on Field Programmable Logic and Applications (FPL'11). IEEE, 2011.

Digital Library

[18]

J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for fpgas: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473--491, april 2011.

Digital Library

[19]

J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Design Automation Conference (DAC'12), June 2012.

Digital Library

[20]

A. Darte, R. Schreiber, and G. Villard. Lattice-based memory allocation. IEEE Trans. Comput., 54(10):1242--1257, 2005.

Digital Library

[21]

P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the defacto system. In LCPC'03, pages 52--70. 2003.

Digital Library

[22]

P. Feautrier. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program., 21(5):389--420, 1992.

Digital Library

[23]

S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006.

Digital Library

[24]

A. Grosslinger. Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes. In Compiler Construction, pages 236--250, 2009.

Digital Library

[25]

A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.

[26]

Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations. ACM Trans. Des. Autom. Electron. Syst., 12, September 2007.

Digital Library

[27]

F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.

Digital Library

[28]

I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. Drdu: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12, April 2007.

Digital Library

[29]

M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In Design Automation Conference, 2002. Proceedings. 39th, pages 628--633, 2002.

Digital Library

[30]

I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997.

Digital Library

[31]

Q. Liu, G. A. Constantinides, K. Masselos, and P. Cheung. Combining data reuse with data-level parallelization for fpga-targeted hardware compilation: A geometric programming framework. Trans. Comp.-Aided Design of Integr. Circuits and Systems, 28(3):305--315, 2009.

Digital Library

[32]

M. Palkovic, F. Catthoor, and H. Corporaal. Trade-offs in loop transformations. ACM Trans. Des. Autom. Electron. Syst., 14:22:1--22:30, April 2009.

Digital Library

[33]

P. R. Panda, N. D. Dutt, and A. Nicolau. Local memory exploration and optimization in embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems, 18:3--13, January 1999.

Digital Library

[34]

PolyOpt: A complete source-to-source Polyhedral Compiler, http://www.cse.ohio-state.edu/pouchet/polyopt.

[35]

L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, 2007.

Digital Library

[36]

B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in fpga-based systems. In Programming Language Design and Implementation, 2002.

Digital Library

[37]

K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 327--337, 2009.

Digital Library

[38]

S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software - ICMS 2010, pages 299--302, 2010.

Digital Library

[39]

M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation, pages 30--44, New York, June 1991.

Digital Library

[40]

M. Wolfe. Iteration space tiling for memory hierarchies. In 3rd SIAM Conf. on Parallel Processing for Scientific Computing, pages 357--361, Dec. 1987.

Digital Library

[41]

W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations. In Proc. of the ACM/SIGDA Intl. Symp. on Field Programmable Gate Arrays (FPGA'13), 2013.

Digital Library

Cited By

Pouget SPouchet LCong J(2025)Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming ApproachACM Transactions on Design Automation of Electronic Systems10.1145/371184730:2(1-44)Online publication date: 7-Feb-2025
https://dl.acm.org/doi/10.1145/3711847
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Show More Cited By

Index Terms

Polyhedral-based data reuse optimization for configurable computing
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Datapath optimization
    2. Logic synthesis
      1. Circuit optimization
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

This paper presents initial work on developing a C compiler for the CoRAM FPGA computing abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that operate on large data sets in external DRAM. As required by the CoRAM ...
Efficient hardware code generation for FPGAs

The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data ...
Analyzing data reuse for cache reconfiguration

Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each nest might work best with a different cache configuration and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

February 2013

294 pages

ISBN:9781450318877

DOI:10.1145/2435264

General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '13

Sponsor:

SIGDA

FPGA '13: The 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 11 - 13, 2013

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

131
Total Citations
View Citations
1,392
Total Downloads

Downloads (Last 12 months)91
Downloads (Last 6 weeks)13

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pouget SPouchet LCong J(2025)Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming ApproachACM Transactions on Design Automation of Electronic Systems10.1145/371184730:2(1-44)Online publication date: 7-Feb-2025
https://dl.acm.org/doi/10.1145/3711847
Chen HZhang NXiang SZeng ZDai MZhang Z(2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656401
Jayaweera MKong MWang YKaeli DGrosser TDubach CSteuwer MXue JOttoni GQuintão Pereira F(2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
https://dl.acm.org/doi/10.1109/CGO57630.2024.10444795
Tu KTang XYu CJosipović LChu ZTu KTang XYu CJosipović LChu Z(2024)High-Level SynthesisFPGA EDA10.1007/978-981-99-7755-0_8(113-134)Online publication date: 1-Feb-2024
https://doi.org/10.1007/978-981-99-7755-0_8
Srivastava NLiu GLai YZhang Z(2024)FPGA-Specific CompilersHandbook of Computer Architecture10.1007/978-981-97-9314-3_25(989-1025)Online publication date: 21-Dec-2024
https://doi.org/10.1007/978-981-97-9314-3_25
Brandner JMayer FPhilippsen M(2024)Multilayer Multipurpose Caches for OpenMP Target Regions on FPGAsAdvancing OpenMP for Future Accelerators10.1007/978-3-031-72567-8_6(79-93)Online publication date: 16-Sep-2024
https://doi.org/10.1007/978-3-031-72567-8_6
Xu RMa SGuo YLi D(2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3604802
Mururu GNi KGavrilovska APande SEgger BLee D(2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
https://dl.acm.org/doi/10.1145/3589610.3596279
Ferry CYuki TDerrien SRajopadhye S(2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: May-2023
https://doi.org/10.1109/TCAD.2022.3201494
Brandner JMayer FPhilippsen M(2023)Multipurpose Cacheing to Accelerate OpenMP Target Regions on FPGAsOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_10(147-162)Online publication date: 1-Sep-2023
https://doi.org/10.1007/978-3-031-40744-4_10
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten