skip to main content
10.1145/2435264.2435273acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Polyhedral-based data reuse optimization for configurable computing

Published: 11 February 2013 Publication History

Abstract

Many applications, such as medical imaging, generate intensive data traffic between the FPGA and off-chip memory. Significant improvements in the execution time can be achieved with effective utilization of on-chip (scratchpad) memories, associated with careful software-based data reuse and communication scheduling techniques. We present a fully automated C-to-FPGA framework to address this problem. Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring. In addition, our proposed framework automatically implements critical optimizations for performance such as task-level parallelization, loop pipelining, and data prefetching.
We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management. Our technique can satisfy hardware resource constraints (scratchpad size) while still aggressively exploiting data reuse. Our approach can also be used to reduce the on-chip buffer size subject to bandwidth constraint. We also implement a fast design space exploration technique for effective optimization of program performance using the Xilinx high-level synthesis tool.

References

[1]
Center for domain-specific computing. http://cdsc.ucla.edu.
[2]
Convey. http://www.conveycomputer.com.
[3]
http://www.xilinx.com/products/design-tools/ise-design-suite/index.htm.
[4]
Pocc 1.1. http://pocc.sourceforge.net.
[5]
An independent evaluation of the autoesl autopilot high-level synthesis tool. Technical report, Berkeley Design Technology, Inc., 2010.
[6]
N. Ahmed, N. Mateev, and K. Pingali. Tiling imperfectly-nested loop nests. In ACM/IEEE Conf. on Supercomputing (SC'00), Dallas, TX, USA, Nov. 2000.
[7]
C. Alias, A. Darte, and A. Plesco. Optimizing remote accesses for offloaded kernels: application to high-level synthesis for fpga. SIGPLAN Not., 47(8):285--286, Feb. 2012.
[8]
J. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, 2002.
[9]
M. M. Baskaran, U. Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories. In ACM Symposium on Principles and practice of parallel programming, pages 1--10. ACM, 2008.
[10]
C. Bastoul. Code generation in the polyhedral model is easier than you think. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT'04), pages 7--16, Sept. 2004.
[11]
S. Bayliss and G. A. Constantinides. Optimizing sdram bandwidth for custom fpga loop accelerators. In Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM.
[12]
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral program optimization system. In ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2008.
[13]
E. Brockmeyer, M. Miranda, and F. Catthoor. Layer assignment techniques for low energy in multi-layered memory organisations. In Design, Automation and Test in Europe Conference and Exhibition, 2003, pages 1070--1075, 2003. DATE.
[14]
F. Catthoor, K. Danckaert, K. Kulkarni, E. Brockmeyer, P. Kjeldsberg, T. v. Achteren, and T. Omnes. Data access and storage management for embedded programmable processors. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[15]
F. Catthoor, E. d. Greef, and S. Suytack. Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design. Kluwer Academic Publishers, Norwell, MA, USA, 1998.
[16]
J. Cong, K. Guruaj, M. Huang, S. Li, B. Xiao, and Y. Zou. Domain-specific processor with 3d integration for medical image processing. In IEEE Intl. Conf. on Application-Specific Systems, Architectures and Processors, pages 247--250, sept. 2011.
[17]
J. Cong, M. Huang, and Y. Zou. Accelerating fluid registration algorithm on multi-fpga platforms. In Proc. of Intl. Conf. on Field Programmable Logic and Applications (FPL'11). IEEE, 2011.
[18]
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-level synthesis for fpgas: From prototyping to deployment. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(4):473--491, april 2011.
[19]
J. Cong, P. Zhang, and Y. Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In Design Automation Conference (DAC'12), June 2012.
[20]
A. Darte, R. Schreiber, and G. Villard. Lattice-based memory allocation. IEEE Trans. Comput., 54(10):1242--1257, 2005.
[21]
P. Diniz, M. Hall, J. Park, B. So, and H. Ziegler. Bridging the gap between compilation and synthesis in the defacto system. In LCPC'03, pages 52--70. 2003.
[22]
P. Feautrier. Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time. Int. J. Parallel Program., 21(5):389--420, 1992.
[23]
S. Girbal, N. Vasilache, C. Bastoul, A. Cohen, D. Parello, M. Sigler, and O. Temam. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. Intl. J. of Parallel Programming, 34(3), 2006.
[24]
A. Grosslinger. Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes. In Compiler Construction, pages 236--250, 2009.
[25]
A.-C. Guillou, F. Quilleré, P. Quinton, S. Rajopadhye, and T. Risset. Hardware design methodology with the Alpha language. In FDL'01, Lyon, France, Sept. 2001.
[26]
Q. Hu, P. G. Kjeldsberg, A. Vandecappelle, M. Palkovic, and F. Catthoor. Incremental hierarchical memory size estimation for steering of loop transformations. ACM Trans. Des. Autom. Electron. Syst., 12, September 2007.
[27]
F. Irigoin and R. Triolet. Supernode partitioning. In ACM SIGPLAN Principles of Programming Languages, pages 319--329, 1988.
[28]
I. Issenin, E. Brockmeyer, M. Miranda, and N. Dutt. Drdu: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12, April 2007.
[29]
M. Kandemir and A. Choudhary. Compiler-directed scratch pad memory hierarchy design and management. In Design Automation Conference, 2002. Proceedings. 39th, pages 628--633, 2002.
[30]
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In ACM SIGPLAN'97 Conf. on Programming Language Design and Implementation, pages 346--357, Las Vegas, June 1997.
[31]
Q. Liu, G. A. Constantinides, K. Masselos, and P. Cheung. Combining data reuse with data-level parallelization for fpga-targeted hardware compilation: A geometric programming framework. Trans. Comp.-Aided Design of Integr. Circuits and Systems, 28(3):305--315, 2009.
[32]
M. Palkovic, F. Catthoor, and H. Corporaal. Trade-offs in loop transformations. ACM Trans. Des. Autom. Electron. Syst., 14:22:1--22:30, April 2009.
[33]
P. R. Panda, N. D. Dutt, and A. Nicolau. Local memory exploration and optimization in embedded systems. IEEE Trans. on CAD of Integrated Circuits and Systems, 18:3--13, January 1999.
[34]
PolyOpt: A complete source-to-source Polyhedral Compiler, http://www.cse.ohio-state.edu/pouchet/polyopt.
[35]
L.-N. Pouchet, C. Bastoul, A. Cohen, and N. Vasilache. Iterative optimization in the polyhedral model: Part I, one-dimensional time. In IEEE/ACM Intl. Symp. on Code Generation and Optimization (CGO'07), pages 144--156, 2007.
[36]
B. So, M. W. Hall, and P. C. Diniz. A compiler approach to fast hardware design space exploration in fpga-based systems. In Programming Language Design and Implementation, 2002.
[37]
K. Trifunovic, D. Nuzman, A. Cohen, A. Zaks, and I. Rosen. Polyhedral-model guided loop-nest auto-vectorization. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques, pages 327--337, 2009.
[38]
S. Verdoolaege. isl: An integer set library for the polyhedral model. In Mathematical Software - ICMS 2010, pages 299--302, 2010.
[39]
M. Wolf and M. Lam. A data locality optimizing algorithm. In ACM SIGPLAN'91 Conf. on Programming Language Design and Implementation, pages 30--44, New York, June 1991.
[40]
M. Wolfe. Iteration space tiling for memory hierarchies. In 3rd SIAM Conf. on Parallel Processing for Scientific Computing, pages 357--361, Dec. 1987.
[41]
W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations. In Proc. of the ACM/SIGDA Intl. Symp. on Field Programmable Gate Arrays (FPGA'13), 2013.

Cited By

View all
  • (2025)Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming ApproachACM Transactions on Design Automation of Electronic Systems10.1145/371184730:2(1-44)Online publication date: 7-Feb-2025
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2013
294 pages
ISBN:9781450318877
DOI:10.1145/2435264
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compilation
  2. data reuse
  3. high-level synthesis
  4. program transformations

Qualifiers

  • Research-article

Conference

FPGA '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)13
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Automatic Hardware Pragma Insertion in High-Level Synthesis: A Non-Linear Programming ApproachACM Transactions on Design Automation of Electronic Systems10.1145/371184730:2(1-44)Online publication date: 7-Feb-2025
  • (2024)Allo: A Programming Model for Composable Accelerator DesignProceedings of the ACM on Programming Languages10.1145/36564018:PLDI(593-620)Online publication date: 20-Jun-2024
  • (2024)Energy-Aware Tile Size Selection for Affine Programs on GPUsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444795(13-27)Online publication date: 2-Mar-2024
  • (2024)High-Level SynthesisFPGA EDA10.1007/978-981-99-7755-0_8(113-134)Online publication date: 1-Feb-2024
  • (2024)FPGA-Specific CompilersHandbook of Computer Architecture10.1007/978-981-97-9314-3_25(989-1025)Online publication date: 21-Dec-2024
  • (2024)Multilayer Multipurpose Caches for OpenMP Target Regions on FPGAsAdvancing OpenMP for Future Accelerators10.1007/978-3-031-72567-8_6(79-93)Online publication date: 16-Sep-2024
  • (2023)A Survey of Design and Optimization for Systolic Array-based DNN AcceleratorsACM Computing Surveys10.1145/360480256:1(1-37)Online publication date: 17-Jun-2023
  • (2023)PinIt: Influencing OS Scheduling via Compiler-Induced AffinitiesProceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3589610.3596279(87-98)Online publication date: 13-Jun-2023
  • (2023)Increasing FPGA Accelerators Memory Bandwidth With a Burst-Friendly Memory LayoutIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.320149442:5(1546-1559)Online publication date: May-2023
  • (2023)Multipurpose Cacheing to Accelerate OpenMP Target Regions on FPGAsOpenMP: Advanced Task-Based, Device and Compiler Programming10.1007/978-3-031-40744-4_10(147-162)Online publication date: 1-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media