skip to main content
10.1145/3241793.3241805acmotherconferencesArticle/Chapter ViewAbstractPublication PagesheartConference Proceedingsconference-collections
research-article

Use of CPU Performance Counters for Accelerator Selection in HLS-Generated CPU-Accelerator Systems

Published: 20 June 2018 Publication History

Abstract

Modern HLS tools are capable of generating hybrid software-accelerator systems that target architectures containing both CPU and FPGA resources. However, given a particular application, it is often unclear how to best distribute the workload between the FPGA and the processor. This paper investigates the use of CPU performance counters for estimating the quality of hybrid CPU-accelerator systems generated by HLS tools. We find that although this method enables a rough order-of-magnitude performance estimation, it is rarely sufficient for the automatic selection of good accelerators. We show that accurate estimates can be achieved with a model that is aware of the HLS tool's capabilities -- estimating accelerator performance to within 5% on average.

References

[1]
Altera Corp. Altera SDK for OpenCL, 2014.
[2]
ARM. ARM Cortex-A9 Technical Reference Manual, 2016.
[3]
ARM. CoreLink Level 2 Cache Controller Technical Reference Manual, 2016.
[4]
A. Canis et al. LegUp: high-level synthesis for FPGA-based processor/accelerator systems. In ACM/SIGDA Int. Symp. on Field Programmable Gate Arrays (FPGA). ACM, 2011.
[5]
A. Canis et al. From software to accelerators with LegUp high-level synthesis. In Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems (CASES). IEEE, 2013.
[6]
J. Choi, S. Brown, and J. Anderson. From software threads to parallel hardware in high-level synthesis for FPGAs. In Int. Conf. on Field-Programmable Technology (FPT). IEEE, 2013.
[7]
T. Feist. Vivado design suite. White Paper, 5, 2012.
[8]
B. Fort et al. Automating the design of processor/accelerator embedded systems with LegUp high-level synthesis. In IEEE Int. Conf. on Embedded and Ubiquitous Computing (EUC), pages 120--129. IEEE, 2014.
[9]
Y. Hara et al. Proposal and quantitative analysis of the CHStone benchmark program suite for practical C-based high-level synthesis. Jour. of Information Processing, 17, 2009.
[10]
J. Huthmann et al. Hardware/software co-compilation with the nymble system. In Int. Workshop on Reconfigurable and Communication-Centric Systems-on-Chip. IEEE, 2013.
[11]
Intel Corp. Intel HLS Compiler, 2018.
[12]
D. Koeplinger et al. Automatic generation of efficient accelerators for reconfigurable hardware. In ACM/IEEE Int. Symp. on Computer Architecture (ISCA). IEEE, 2016.
[13]
R. Meeuws et al. High level quantitative hardware prediction modeling using statistical methods. In IEEE Int. Conf. on Embedded Computer Systems (SAMOS). IEEE, 2011.
[14]
Mentor Graphics. Catapult C synthesis, 2008.
[15]
S. Mittal and J. S. Vetter. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys (CSUR), 47(4):69, 2015.
[16]
R. Nane et al. High-level synthesis in the delft workbench hardware/software codesign toolchain. In IEEE Int. Conf. on Embedded and Ubiquitous Computing (EUC). IEEE, 2014.
[17]
J. Oppermann and A. Koch. Detecting kernels suitable for C-based high-level hardware synthesis. In IEEE Conf. on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, pages 1157--1164. IEEE, 2016.
[18]
S. A. Ostadzadeh et al. Quad--a memory access pattern analyser. In Int. Symp. on Applied Reconfigurable Computing, pages 269--281. Springer, 2010.
[19]
L. Piccolboni et al. Cosmos: Coordination of high-level synthesis and memory optimization for hardware accelerators. ACM Transactions on Embedded Computing Systems (TECS), 16(5s):150, 2017.
[20]
B. C. Schafer and K. Wakabayashi. Machine learning predictive modelling high-level synthesis design space exploration. IET Computers & Digital Techniques, 6(3):153--159, 2012.
[21]
Y. S. Shao et al. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In ACM SIGARCH Computer Architecture News, volume 42, pages 97--108. IEEE, 2014.
[22]
S. S. Shende and A. D. Malony. The TAU parallel performance system. The Int. Journal of High Performance Computing Applications, 20(2):287--311, 2006.
[23]
C. Tofallis. Least squares percentage regression. Journal of Modern Applied Statistical Methods, 2009.
[24]
K. H. Tsoi and W. Luk. Axel: a heterogeneous cluster with FPGAs and GPUs. In ACM/SIGDA Int. Symp. on Field programmable gate arrays (FPGA). ACM, 2010.
[25]
M. Vogt et al. GCC-plugin for automated accelerator generation and integration on hybrid FPGA-SoCs. Int. Workshop on FPGAs for Software Programmers (FSP), 2015.
[26]
Xilinx, Inc. SDAccel Development Environment, 2018.
[27]
J. Zhao et al. Comba: A comprehensive model-based analysis framework for high level synthesis of real applications. In IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), pages 430--437. IEEE, 2017.
[28]
G. Zhong, A. Prakash, Y. Liang, T. Mitra, and S. Niar. Lin-analyzer: a high-level performance analysis tool for FPGA-based accelerators. In ACM/EDAC/IEEE Design Automation Conf. (DAC), page 136. ACM, 2016.

Cited By

View all
  • (2024)Automating application-driven customization of ASIPs: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103080148(103080)Online publication date: Mar-2024
  • (2022)Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded ProcessorsIEEE Access10.1109/ACCESS.2022.315311910(22274-22287)Online publication date: 2022
  • (2021)Heterogeneous Multi-Agent System for Brain-Computer Interaction in Routing and Forwarding with Memristive Neuron NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3137837(1-1)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
HEART '18: Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies
June 2018
125 pages
ISBN:9781450365420
DOI:10.1145/3241793
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HEART 2018

Acceptance Rates

Overall Acceptance Rate 22 of 50 submissions, 44%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating application-driven customization of ASIPs: A surveyJournal of Systems Architecture10.1016/j.sysarc.2024.103080148(103080)Online publication date: Mar-2024
  • (2022)Using HW/SW Codesign for Deep Neural Network Hardware Accelerator Targeting Low-Resources Embedded ProcessorsIEEE Access10.1109/ACCESS.2022.315311910(22274-22287)Online publication date: 2022
  • (2021)Heterogeneous Multi-Agent System for Brain-Computer Interaction in Routing and Forwarding with Memristive Neuron NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3137837(1-1)Online publication date: 2021
  • (2020)Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and ToolchainsIEEE Access10.1109/ACCESS.2020.30240988(174692-174722)Online publication date: 2020
  • (2019)Compiler-Assisted Selection of Hardware Acceleration Candidates from Application Source Code2019 IEEE 37th International Conference on Computer Design (ICCD)10.1109/ICCD46524.2019.00024(129-137)Online publication date: Nov-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media