skip to main content
10.1145/2435264.2435271acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Improving high level synthesis optimization opportunity through polyhedral transformations

Published: 11 February 2013 Publication History

Abstract

High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.

References

[1]
Pocc. The polyhedral compiler collection. http://www.cse.ohio-state.edu/~pouchet/software/pocc/.
[2]
Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int. J. Parallel Program., 29(5):493--544, 2001.
[3]
Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for custom FPGA loop accelerators. In FPGA, pages 195--204, 2012.
[4]
Thomas Bollaert. High-Level Synthesis: From Algorithm to Digital Circuit, chapter Catapult synthesis: A practical introduction to interactive C syntheis. Springer, 2008.
[5]
Uday Bondhugula et al. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In CC, pages 132-146, 2008.
[6]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008.
[7]
Andrew Canis et al. Legup: high-level synthesis for FPGA-based processor/accelerator systems. In FPGA, pages 33--36, 2011.
[8]
Jason Cong et al. High-level synthesis for FPGAs: From prototyping to deployment. IEEE TCAD, pages 473--491, 2011.
[9]
Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, and Zhiru Zhang. Behavior and communication co-optimization for systems with sequential communication media. In DAC, pages 675--678, 2006.
[10]
Jason Cong, Muhuan Huang, and Yi Zou. Accelerating fluid registration algorithm on multi-FPGA platforms. In FPL, pages 50--57, 2011.
[11]
Jason Cong, Vivek Sarkar, Glenn Reinman, and Alex Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15.
[12]
Jason Cong, Peng Zhang, and Yi Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In DAC, pages 1233--1238, 2012.
[13]
Paul Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program., 21(5):313--348, 1992.
[14]
Swathi T. Gurumani et al. High-level synthesis of multiple dependent CUDA kernels on FPGA. In ASPDAC, 2013.
[15]
Ilya Issenin, Erik Brockmeyer, Miguel Miranda, and Nikil Dutt. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12(2), 2007.
[16]
Peng Li et al. Memory partitioning and scheduling co-optimization in behavioral synthesis. In ICCAD, pages 488--495, 2012.
[17]
Yun Liang et al. High-level synthesis: Productivity, performance, and software constraints. J. Electrical and Computer Engineering, 2012.
[18]
Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS, pages 228--237, 1999.
[19]
Alex Papakonstantinou et al. Multilevel granularity parallelism synthesis on FPGAs. In FCCM, pages 178--185. IEEE, 2011.
[20]
Louis-Noël Pouchet et al. Loop transformations: convexity, pruning and optimization. In POPL, pages 549--562, 2011.
[21]
Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013.
[22]
Rui Rodrigues, Joao M. P. Cardoso, and Pedro C. Diniz. A data-driven approach for pipelining sequences of data-dependent loops. In FCCM, pages 219--228, 2007.
[23]
Kyle Rupnow et al. High level synthesis of stereo matching: Productivity, performance, and software constraints. In FPT, pages 1--8, 2011.
[24]
Michael. E. Wolf and Monica. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991.
[25]
Zhiru Zhang et al. High-Level Synthesis: From Algorithm to Digital Circuit, chapter AutoPilot: a platform-based ESL synthesis system. Springer, 2008.
[26]
Heidi E. Ziegler, Mary W. Hall, and Pedro C. Diniz. Compiler-generated communication for pipelined FPGA applications. In DAC, pages 610--615, 2003.

Cited By

View all
  • (2024)Learning to Compare Hardware Designs for High-Level SynthesisProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD10.1145/3670474.3685940(1-7)Online publication date: 9-Sep-2024
  • (2024)Learning to Compare Hardware Designs for High-Level Synthesis2024 ACM/IEEE 6th Symposium on Machine Learning for CAD (MLCAD)10.1109/MLCAD62225.2024.10740257(1-7)Online publication date: 9-Sep-2024
  • (2024)Array Partitioning Method for Streaming Dataflow Optimization in High-level Synthesis2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10618042(278-282)Online publication date: 10-May-2024
  • Show More Cited By

Index Terms

  1. Improving high level synthesis optimization opportunity through polyhedral transformations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
    February 2013
    294 pages
    ISBN:9781450318877
    DOI:10.1145/2435264
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 February 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. high level synthesis
    3. polyhedral

    Qualifiers

    • Research-article

    Conference

    FPGA '13
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 125 of 627 submissions, 20%

    Upcoming Conference

    FPGA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning to Compare Hardware Designs for High-Level SynthesisProceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD10.1145/3670474.3685940(1-7)Online publication date: 9-Sep-2024
    • (2024)Learning to Compare Hardware Designs for High-Level Synthesis2024 ACM/IEEE 6th Symposium on Machine Learning for CAD (MLCAD)10.1109/MLCAD62225.2024.10740257(1-7)Online publication date: 9-Sep-2024
    • (2024)Array Partitioning Method for Streaming Dataflow Optimization in High-level Synthesis2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10618042(278-282)Online publication date: 10-May-2024
    • (2024)An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00017(75-90)Online publication date: 2-Mar-2024
    • (2024)High-Level SynthesisFPGA EDA10.1007/978-981-99-7755-0_8(113-134)Online publication date: 1-Feb-2024
    • (2023)Parallelising Control Flow in Dynamic-scheduling High-level SynthesisACM Transactions on Reconfigurable Technology and Systems10.1145/359997316:4(1-32)Online publication date: 1-Sep-2023
    • (2023)Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory AcceleratorsACM Transactions on Architecture and Code Optimization10.1145/357290820:2(1-26)Online publication date: 1-Mar-2023
    • (2023)High-level Synthesis for Domain Specific ComputingProceedings of the 2023 International Symposium on Physical Design10.1145/3569052.3580027(211-219)Online publication date: 26-Mar-2023
    • (2023)AHA: An Agile Approach to the Design of Coarse-Grained Reconfigurable Accelerators and CompilersACM Transactions on Embedded Computing Systems10.1145/353493322:2(1-34)Online publication date: 24-Jan-2023
    • (2023)IronMan-Pro: Multiobjective Design Space Exploration in HLS via Reinforcement Learning and Graph Neural Network-Based ModelingIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318554042:3(900-913)Online publication date: Mar-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media