Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

Authors:
Akash Kothari

University of Illinois at Urbana-Champaign, Champaign, Illinois, USA

University of Illinois at Urbana-Champaign, Champaign, Illinois, USA

https://orcid.org/0009-0009-0319-0333
View Profile

,
Abdul Rafae Noor

University of Illinois at Urbana-Champaign, Champaign, USA

University of Illinois at Urbana-Champaign, Champaign, USA

https://orcid.org/0000-0002-9979-3252
View Profile

,
Muchen Xu

University of Illinois at Urbana-Champaign, Champaign, USA

University of Illinois at Urbana-Champaign, Champaign, USA

https://orcid.org/0009-0001-3381-2190
View Profile

,
Hassam Uddin

University of Illinois at Urbana-Champaign, Champaign, USA

University of Illinois at Urbana-Champaign, Champaign, USA

https://orcid.org/0009-0003-3777-4878
View Profile

,
Dhruv Baronia

University of Illinois at Urbana-Champaign, Champaign, USA

University of Illinois at Urbana-Champaign, Champaign, USA

https://orcid.org/0009-0001-8557-8770
View Profile

,
Stefanos Baziotis

University of Illinois at Urbana-Champaign, Champaign, USA

University of Illinois at Urbana-Champaign, Champaign, USA

https://orcid.org/0009-0001-4061-7094
View Profile

,
Vikram Adve

University of Illinois at Urbana-Champaign, Champaign, United States of America

University of Illinois at Urbana-Champaign, Champaign, United States of America

https://orcid.org/0000-0002-0760-9690
View Profile

,
Charith Mendis

University of Illinois at Urbana-Champaign, Champaign, United States of America

University of Illinois at Urbana-Champaign, Champaign, United States of America

https://orcid.org/0000-0002-8140-2321
View Profile

,
Sudipta Sengupta

Amazon AWS, Seattle, USA

Amazon AWS, Seattle, USA

https://orcid.org/0009-0001-6331-9524
View Profile

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2April 2024Pages 514–529https://doi.org/10.1145/3620665.3640385

Published:27 April 2024Publication History

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

Pages 514–529

ABSTRACT

As modern hardware architectures evolve to support increasingly diverse, complex instruction sets for meeting the performance demands of modern workloads in image processing, deep learning, etc., it has become ever more crucial for compilers to provide robust support for evolution of their internal abstractions and retargetable code generation support to keep pace with emerging instruction sets. We propose Hydride, a novel approach to compiling for complex, emerging hardware architectures. Hydride uses vendor-defined pseudocode specifications of multiple hardware ISAs to automatically design retargetable instructions for AutoLLVM IR, an extensible compiler IR which consists of (formally defined) language-independent and target-independent LLVM IR instructions to compile to those ISAs, and automatically generated instruction selection passes to lower AutoLLVM IR to each of the specified hardware ISAs. Hydride also includes a code synthesizer that automatically generates code generation support for schedule-based languages, such as Halide, to optimally generate AutoLLVM IR. Our results show that Hydride is able to represent 3,557 instructions combined in x86, Hexagon, ARM architectures using only 397 AutoLLVM IR instructions, including (Intel) SSE2, SSE4, AVX, AVX2, AVX512, (Qualcomm) Hexagon HVX, and (ARM) NEON vector ISAs. We created a new Halide compiler with Hydride using only a formal semantics of Halide IR, leveraging the auto-generated AutoLLVM IR and back-ends for the three hardware architectures. Across kernels from deep learning and image processing, this compiler is able to perform just as well as the mature, production Halide compiler on Hexagon, and outperform on x86 by 8% and ARM by 3%. Hydride also outperforms the production Halide's LLVM back end by 12% on x86, 100% on HVX, and 26% on ARM across the same kernels.

References

Maaz Bin Safeer Ahmad, Alexander J Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. Vector instruction selection for digital signal processors using program synthesis. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1004--1016, 2022.Google ScholarDigital Library
ARM. ARM Developer Intrinsics. https://developer.arm.com/architectures/instruction-sets/intrinsics/f:@navigationhierarchiessimdisa=[Neon].Google Scholar
Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E Gray, Robert Norton-Wright, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, et al. Isa semantics for armv8-a, risc-v, and cheri-mips. 2019.Google ScholarDigital Library
Sorav Bansal and Alex Aiken. Automatic generation of peephole super-optimizers. ACM SIGARCH Computer Architecture News, 34(5):394--403, 2006.Google ScholarDigital Library
Sebastian Buchwald, Andreas Fried, and Sebastian Hack. Synthesizing an instruction selection rule library from semantic specifications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, pages 300--313, 2018.Google ScholarDigital Library
RG Cattell. Automatic derivation of code generators from machine descriptions. ACM Transactions on Programming Languages and Systems (TOPLAS), 2(2):173--190, 1980.Google Scholar
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18), pages 578--594, 2018.Google Scholar
Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. Vegen: a vectorizer generator for simd and beyond. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 902--914, 2021.Google ScholarDigital Library
Lucian Codrescu. Architecture of the hexagon™ 680 dsp for mobile imaging and computer vision. In 2015 IEEE Hot Chips 27 Symposium (HCS), pages 1--26. IEEE, 2015.Google ScholarCross Ref
Meghan Cowan, Deeksha Dangwal, Armin Alaghi, Caroline Trippel, Vincent T Lee, and Brandon Reagen. Porcupine: A synthesizing compiler for vectorized homomorphic encryption. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 375--389, 2021.Google ScholarDigital Library
Halide. Halide. https://github.com/halide/Halide, 2021.Google Scholar
Intel. Intel Deep Learning Boost. https://www.intel.com/content/dam/www/public/us/en/documents/product-overviews/dl-boost-product-overview.pdf, 2019.Google Scholar
Intel. Intel Intrinsics Guide. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html, 2023.Google Scholar
Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004., pages 75--86. IEEE, 2004.Google ScholarCross Ref
Zhengyang Liu, Stefan Mada, and John Regehr. Minotaur: A simd-oriented synthesizing superoptimizer. arXiv preprint arXiv:2306.00229, 2023.Google Scholar
Phitchaya Mangpo Phothilimthana, Aditya Thakur, Rastislav Bodik, and Dinakar Dhurjati. Scaling up superoptimization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297--310, 2016.Google ScholarDigital Library
Qualcomm. Exploring the AI capabilities of the Qualcomm Snapdragon 888 Mobile Platform [video]. https://www.qualcomm.com/news/onq/2020/12/02/exploring-ai-capabilities-qualcomm-snapdragon-888-mobile-platform, 2020.Google Scholar
Qualcomm. Qualcomm Hexagon V66 HVX Programmer's Reference Manual. https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual, 2022.Google Scholar
Alexander J Root, Maaz Bin Safeer Ahmad, Dillon Sharlet, Andrew Adams, Shoaib Kamil, and Jonathan Ragan-Kelley. Fast instruction selection for fast digital signal processing. 2023.Google Scholar
Alexander James Root. Optimizing Vector Instruction Selection for Digital Signal Processing. PhD thesis, Massachusetts Institute of Technology, 2022.Google Scholar
Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. Souper: A synthesizing superoptimizer. arXiv preprint arXiv:1711.04422, 2017.Google Scholar
Armando Solar-Lezama, Christopher Grant Jones, and Rastislav Bodik. Sketching concurrent data structures. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, page 136--148, 2008.Google ScholarDigital Library
Tensorflow XLA Team. XLA: Optimizing Compiler for Machine Learning. https://www.tensorflow.org/xla, 2022.Google Scholar
The LLVM Project. LLVM Language Reference Manual. https://llvm.org/docs/LangRef.html, 2022.Google Scholar
Samuel Thomas and James Bornholt. Automatic generation of vectorizing compilers for customizable digital signal processors. 2024.Google ScholarDigital Library
Emina Torlak and Rastislav Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software, pages 135--152, 2013.Google ScholarDigital Library
Alexa VanHattum, Rachit Nigam, Vincent T Lee, James Bornholt, and Adrian Sampson. Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 874--886, 2021.Google ScholarDigital Library

Recommendations

Architectural and compiler issues for tolerating latencies in horizontal architectures
Read More
A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism
Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing Compilers

This article presents a MATLAB-to-C compiler that exploits custom instructions present in state-of-the-art processor architectures and supports semi-automatic vectorization. A parameterized processor model is used to describe the target instruction set ...
Read More
A retargetable VLIW compiler framework for DSPs with instruction-level parallelism

A standard design methodology for embedded processors today is the system-on-a-chip design with potentially multiple heterogeneous processing elements on a chip, such as a very long instruction word (VLIW) processor, digital signal processor (DSP), and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
April 2024
1299 pages
ISBN:9798400703850
DOI:10.1145/3620665
General Chairs:
Nael Abu-Ghazaleh,
Rajiv Gupta,
Program Chairs:
Madan Musuvathi,
Dan Tsafrir
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2024
Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 64
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)64
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2

ABSTRACT

References

Cited By

Recommendations

Architectural and compiler issues for tolerating latencies in horizontal architectures

A Retargetable MATLAB-to-C Compiler Exploiting Custom Instructions and Data Parallelism

A retargetable VLIW compiler framework for DSPs with instruction-level parallelism