research-article

PriMax: maximizing DSL application performance with selective primitive acceleration

Authors:
Nicholas Wendt

University of Michigan

University of Michigan
View Profile

,
Todd Austin

University of Michigan

University of Michigan
View Profile

,
Valeria Bertacco

University of Michigan

University of Michigan
View Profile

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceJuly 2022Pages 139–144https://doi.org/10.1145/3489517.3530431

Published:23 August 2022Publication History

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 139–144

ABSTRACT

Domain-specific languages (DSLs) improve developer productivity by abstracting away low-level details of an algorithm's implementation within a specialized domain. These languages often provide powerful primitives to describe complex operations, potentially granting flexibility during compilation to target hardware acceleration. This work proposes PriMax, a novel methodology to effectively map DSL applications to hardware accelerators. It builds decision trees based on benchmark results, which select between distinct implementations of accelerated primitives to maximize a target performance metric. In our graph analytics case study with two accelerators, PriMax produces a geometric mean speedup of 1.57x over a multicore CPU, higher than either target accelerator alone, and approaching the maximum 1.58x speedup attainable with these target accelerators.

References

Abraham Addisie et al. 2018. Heterogeneous Memory Subsystem for Natural Graph Analytics. In IEEE International Symposium on Workload Characterization. 134--145. Google ScholarCross Ref
Scott Beamer et al. 2015. The GAP Benchmark Suite. arXiv:1508.03619Google Scholar
Nathan Binkert et al. 2011. The Gem5 Simulator. ACM SIGARCH Computer Architecture News 39, 2 (Aug. 2011), 1--7. Google ScholarDigital Library
Ajay Brahmakshatriya et al. 2021. Taming the Zoo: The Unified GraphIt Compiler Framework for Novel Architectures. In International Symposium on Computer Architecture. 429--442. Google ScholarDigital Library
Tianqi Chen et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In USENIX Symposium on Operating Systems Design and Implementation. 578--594. https://www.usenix.org/conference/osdi18/presentation/chenGoogle Scholar
William Dally et al. 2020. Domain-Specific Hardware Accelerators. Communications of the ACM 63, 7 (Jun. 2020), 48--57. Google ScholarDigital Library
Hadi Esmaeilzadeh et al. 2011. Dark Silicon and the End of Multicore Scaling. ACM SIGARCH Computer Architecture News 39, 3 (Jun. 2011), 365--376. Google ScholarDigital Library
John Hennessy and David Patterson. 2019. A New Golden Age for Computer Architecture. Communications of the ACM 62, 2 (Feb. 2019), 48--60. Google ScholarDigital Library
Hiwot Tadese Kassa et al. 2021. ChipAdvisor: A Machine Learning Approach for Mapping Applications to Heterogeneous Systems. In International Symposium on Quality Electronic Design. 292--299. Google ScholarCross Ref
Farzad Khorasani et al. 2015. Scalable SIMD-Efficient Graph Processing on GPUs. In International Conference on Parallel Architecture and Compilation Techniques. 39--50. Google ScholarDigital Library
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/dataGoogle Scholar
Fabian Pedregosa et al. 2011. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12, 85 (Feb. 2011), 2825--2830. http://jmlr.org/papers/v12/pedregosa11a.htmlGoogle Scholar
Jonathan Ragan-Kelley et al. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. ACM SIGPLAN Notices 48, 6 (Jun. 2013), 519--530. Google ScholarDigital Library
Arvind Sujeeth et al. 2014. Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages. ACM Transactions on Embedded Computing Systems 13, 4s, Article 134 (Apr. 2014), 25 pages. Google ScholarDigital Library
Zheng Wang and Michael O'Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE 106, 11 (Nov. 2018), 1879--1901. Google ScholarCross Ref
Yunming Zhang et al. 2018. GraphIt: A High-Performance Graph DSL. Proceedings of the ACM on Programming Languages 2, OOPSLA, Article 121 (Nov. 2018), 30 pages. Google ScholarDigital Library
Yunming Zhang et al. 2020. Optimizing Ordered Graph Algorithms with GraphIt. In International Symposium on Code Generation and Optimization. 158--170. Google ScholarDigital Library

Index Terms

PriMax: maximizing DSL application performance with selective primitive acceleration

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference
July 2022
1462 pages
ISBN:9781450391429
DOI:10.1145/3489517
General Chair:
Rob Oshana
NXP
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 August 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 225
  Total Downloads
- Downloads (Last 12 months)114
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PriMax: maximizing DSL application performance with selective primitive acceleration

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

Nuclear Reactor Simulations on OpenCL FPGA Platform

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PriMax: maximizing DSL application performance with selective primitive acceleration

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Evaluation of Rodinia Codes on Intel Xeon Phi

Nuclear Reactor Simulations on OpenCL FPGA Platform

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media