skip to main content
10.1145/3508352.3549374acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping

Published: 22 December 2022 Publication History

Abstract

The multi-pumping resource sharing technique can overcome the limitations commonly found in single-clocked FPGA designs by allowing hardware components to operate at a higher clock frequency than the surrounding system. However, this optimization cannot be expressed in high levels of abstraction, such as HLS, requiring the use of hand-optimized RTL. In this paper we show how to leverage multiple clock domains for computational subdomains on reconfigurable devices through data movement analysis on high-level programs. We offer a novel view on multi-pumping as a compiler optimization --- a superclass of traditional vectorization. As multiple data elements are fed and consumed, the computations are packed temporally rather than spatially. The optimization is applied automatically using an intermediate representation that maps high-level code to HLS. Internally, the optimization injects modules into the generated designs, incorporating RTL for finegrained control over the clock domains. We obtain a reduction of resource consumption by up to 50% on critical components and 23% on average. For scalable designs, this can enable further parallelism, increasing overall performance.

References

[1]
Anderson, J., Beidas, R., Chacko, V., Hsiao, H., Ling, X., Ragheb, O., Wang, X., and Yu, T. CGRA-ME: An open-source framework for CGRA architecture and CAD research. In 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2021), IEEE, pp. 156--162.
[2]
ARM. AMBA® 4 AXI4-stream protocol-specification. https://developer.arm.com/documentation/ihi0051/a/Introduction/About-the-AXI4-Stream-protocol, 2021. [Accessed online; 11th November 2021].
[3]
Ben-Nun, T., de Fine Licht, J., Ziogas, A. N., Schneider, T., and Hoefler, T. Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2019), SC '19.
[4]
Canis, A., Anderson, J. H., and Brown, S. D. Multi-pumping for resource reduction in fpga high-level synthesis, 2013.
[5]
Canis, A., Anderson, J. H., and Brown, S. D. Multi-pumping for resource reduction in FPGA high-level synthesis. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE) (2013), IEEE, pp. 194--197.
[6]
Canis, A., Choi, J., Fort, B., Lian, R., Huang, Q., Calagar, N., Gort, M., Qin, J. J., Aldham, M., Czajkowski, T., et al. From software to accelerators with LegUp high-level synthesis. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES) (2013), IEEE, pp. 1--9.
[7]
Choi, J., Nam, K., Canis, A., Anderson, J., Brown, S., and Czajkowski, T. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines (2012), IEEE, pp. 17--24.
[8]
de Fine Licht, J., Besta, M., Meierhans, S., and Hoefler, T. Transformations of high-level synthesis codes for high-performance computing. IEEE Transactions on Parallel and Distributed Systems (TPDS) 32, 5 (2020), 1014--1029.
[9]
de Fine Licht, J., Kuster, A., De Matteis, T., Ben-Nun, T., Hofer, D., and Hoefler, T. StencilFlow: Mapping large stencil programs to distributed spatial computing systems. To appear in Proceedings of the 19th ACM/IEEE International Symposium on Code Generation and Optimization (CGO'21) (2021).
[10]
de Fine Licht, J., Kwasniewski, G., and Hoefler, T. Flexible communication avoiding matrix multiplication on fpga with high-level synthesis. FPGA '20, Association for Computing Machinery, p. 244--254.
[11]
Guo, L., Chi, Y., Wang, J., Lau, J., Qiao, W., Ustun, E., Zhang, Z., and Cong, J. Autobridge: Coupling coarse-grained floorplanning and pipelining for high-frequency hls design on multi-die fpgas. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (New York, NY, USA, 2021), FPGA '21, Association for Computing Machinery, p. 81--92.
[12]
Intel. Intel® FPGA SDK for OpenCL™ pro edition - best practices guide. https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf, 2021. [Accessed online; 11th November 2021].
[13]
Ronak, B., and Fahmy, S. A. Multipumping flexible DSP blocks for resource reduction on Xilinx FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36, 9 (2016), 1471--1482.
[14]
Shi, R., Ding, Y., Wei, X., Li, H., Liu, H., So, H. K.-H., and Ding, C. FTDL: a tailored FPGA-overlay for deep learning with high scalability. In 2020 57th ACM/IEEE Design Automation Conference (DAC) (2020), IEEE, pp. 1--6.
[15]
Xilinx. AXI4-Stream infrastructure IP suite v3.0. https://www.xilinx.com/support/documentation/ip_documentation/axis_infrastructure_ip_suite/v1_1/pg085-axi4stream-infrastructure.pdf, 2018. [Accessed online; 23rd October 2021].
[16]
Xilinx. Alveo U280 data center accelerator card. https://www.mouser.com/pdfDocs/u280userguide.pdf, 2021. [Accessed online; 11th November 2021].
[17]
Xilinx. Virtex UltraScale+ FPGA data sheet: DC and AC switching characteristics. https://www.xilinx.com/support/documentation/data_sheets/ds923-virtex-ultrascale-plus.pdf, 2021. [Accessed online; 22nd October 2021].
[18]
Xilinx. Vivado design suite user guide - high-level synthesis. https://www.xilinx.com/content/dam/xilinx/support/documentation/sw_manuals/xilinx2020_1/ug902-vivado-high-level-synthesis.pdf, 2021. [Accessed online; 11th November 2021].
[19]
Yantir, H. E., Bayar, S., and Yurdakul, A. Efficient implementations of multi-pumped multi-port register files in FPGAs. In 2013 Euromicro Conference on Digital System Design (2013), IEEE, pp. 185--192.
[20]
Zhao, R., Todman, T., Luk, W., and Niu, X. DeepPump: Multi-pumping deep neural networks. In 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2017), IEEE, pp. 206--206.
[21]
Zurich, S. E. Scalable matrix matrix multiplication on FPGA. https://github.com/spcl/gemm_hls. Accessed: May-2022.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
October 2022
1467 pages
ISBN:9781450392174
DOI:10.1145/3508352
© 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society
  • IEEE CAS
  • IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

  • Swiss National Science Foundation (Ambizione Project)
  • European Research Council grant PSAP
  • Horizon Europe DEEP-SEA Programme
  • Innovation Fund Denmark

Conference

ICCAD '22
Sponsor:
ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
October 30 - November 3, 2022
California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 76
    Total Downloads
  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media