skip to main content
10.1145/2435264.2435271acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Improving high level synthesis optimization opportunity through polyhedral transformations

Authors Info & Claims
Published:11 February 2013Publication History

ABSTRACT

High level synthesis (HLS) is an important enabling technology for the adoption of hardware accelerator technologies. It promises the performance and energy efficiency of hardware designs with a lower barrier to entry in design expertise, and shorter design time. State-of-the-art high level synthesis now includes a wide variety of powerful optimizations that implement efficient hardware. These optimizations can implement some of the most important features generally performed in manual designs including parallel hardware units, pipelining of execution both within a hardware unit and between units, and fine-grained data communication. We may generally classify the optimizations as those that optimize hardware implementation within a code block (intra-block) and those that optimize communication and pipelining between code blocks (inter-block). However, both optimizations are in practice difficult to apply. Real-world applications contain data-dependent blocks of code and communicate through complex data access patterns. Existing high level synthesis tools cannot apply these powerful optimizations unless the code is inherently compatible, severely limiting the optimization opportunity. In this paper we present an integrated framework to model and enable both intra- and inter-block optimizations. This integrated technique substantially improves the opportunity to use the powerful HLS optimizations that implement parallelism, pipelining, and fine-grained communication. Our polyhedral model-based technique systematically defines a set of data access patterns, identifies effective data access patterns, and performs the loop transformations to enable the intra- and inter-block optimizations. Our framework automatically explores transformation options, performs code transformations, and inserts the appropriate HLS directives to implement the HLS optimizations. Furthermore, our framework can automatically generate the optimized communication blocks for fine-grained communication between hardware blocks. Experimental evaluation demonstrates that we can achieve an average of 6.04X speedup over the high level synthesis solution without our transformations to enable intra- and inter-block optimizations.

References

  1. Pocc. The polyhedral compiler collection. http://www.cse.ohio-state.edu/~pouchet/software/pocc/.Google ScholarGoogle Scholar
  2. Nawaaz Ahmed, Nikolay Mateev, and Keshav Pingali. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. Int. J. Parallel Program., 29(5):493--544, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for custom FPGA loop accelerators. In FPGA, pages 195--204, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Thomas Bollaert. High-Level Synthesis: From Algorithm to Digital Circuit, chapter Catapult synthesis: A practical introduction to interactive C syntheis. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Uday Bondhugula et al. Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In CC, pages 132-146, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A practical automatic polyhedral parallelizer and locality optimizer. In PLDI, pages 101--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andrew Canis et al. Legup: high-level synthesis for FPGA-based processor/accelerator systems. In FPGA, pages 33--36, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jason Cong et al. High-level synthesis for FPGAs: From prototyping to deployment. IEEE TCAD, pages 473--491, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jason Cong, Yiping Fan, Guoling Han, Wei Jiang, and Zhiru Zhang. Behavior and communication co-optimization for systems with sequential communication media. In DAC, pages 675--678, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jason Cong, Muhuan Huang, and Yi Zou. Accelerating fluid registration algorithm on multi-FPGA platforms. In FPL, pages 50--57, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jason Cong, Vivek Sarkar, Glenn Reinman, and Alex Bui. Customizable domain-specific computing. IEEE Des. Test, 28(2):6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jason Cong, Peng Zhang, and Yi Zou. Optimizing memory hierarchy allocation with loop transformations for high-level synthesis. In DAC, pages 1233--1238, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Paul Feautrier. Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program., 21(5):313--348, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Swathi T. Gurumani et al. High-level synthesis of multiple dependent CUDA kernels on FPGA. In ASPDAC, 2013.Google ScholarGoogle Scholar
  15. Ilya Issenin, Erik Brockmeyer, Miguel Miranda, and Nikil Dutt. DRDU: A data reuse analysis technique for efficient scratch-pad memory management. ACM Trans. Des. Autom. Electron. Syst., 12(2), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Peng Li et al. Memory partitioning and scheduling co-optimization in behavioral synthesis. In ICCAD, pages 488--495, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yun Liang et al. High-level synthesis: Productivity, performance, and software constraints. J. Electrical and Computer Engineering, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. An affine partitioning algorithm to maximize parallelism and minimize communication. In ICS, pages 228--237, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Alex Papakonstantinou et al. Multilevel granularity parallelism synthesis on FPGAs. In FCCM, pages 178--185. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Louis-Noël Pouchet et al. Loop transformations: convexity, pruning and optimization. In POPL, pages 549--562, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. Polyhedral-based data reuse optimization for configurable computing. In FPGA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rui Rodrigues, Joao M. P. Cardoso, and Pedro C. Diniz. A data-driven approach for pipelining sequences of data-dependent loops. In FCCM, pages 219--228, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kyle Rupnow et al. High level synthesis of stereo matching: Productivity, performance, and software constraints. In FPT, pages 1--8, 2011.Google ScholarGoogle Scholar
  24. Michael. E. Wolf and Monica. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst., 2(4):452--471, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhiru Zhang et al. High-Level Synthesis: From Algorithm to Digital Circuit, chapter AutoPilot: a platform-based ESL synthesis system. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Heidi E. Ziegler, Mary W. Hall, and Pedro C. Diniz. Compiler-generated communication for pipelined FPGA applications. In DAC, pages 610--615, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving high level synthesis optimization opportunity through polyhedral transformations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
      February 2013
      294 pages
      ISBN:9781450318877
      DOI:10.1145/2435264

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 February 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate125of627submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader