skip to main content
10.1145/3620665.3640385acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open Access

Hydride: A Retargetable and Extensible Synthesis-based Compiler for Modern Hardware Architectures

Authors Info & Claims
Published:27 April 2024Publication History

ABSTRACT

As modern hardware architectures evolve to support increasingly diverse, complex instruction sets for meeting the performance demands of modern workloads in image processing, deep learning, etc., it has become ever more crucial for compilers to provide robust support for evolution of their internal abstractions and retargetable code generation support to keep pace with emerging instruction sets. We propose Hydride, a novel approach to compiling for complex, emerging hardware architectures. Hydride uses vendor-defined pseudocode specifications of multiple hardware ISAs to automatically design retargetable instructions for AutoLLVM IR, an extensible compiler IR which consists of (formally defined) language-independent and target-independent LLVM IR instructions to compile to those ISAs, and automatically generated instruction selection passes to lower AutoLLVM IR to each of the specified hardware ISAs. Hydride also includes a code synthesizer that automatically generates code generation support for schedule-based languages, such as Halide, to optimally generate AutoLLVM IR. Our results show that Hydride is able to represent 3,557 instructions combined in x86, Hexagon, ARM architectures using only 397 AutoLLVM IR instructions, including (Intel) SSE2, SSE4, AVX, AVX2, AVX512, (Qualcomm) Hexagon HVX, and (ARM) NEON vector ISAs. We created a new Halide compiler with Hydride using only a formal semantics of Halide IR, leveraging the auto-generated AutoLLVM IR and back-ends for the three hardware architectures. Across kernels from deep learning and image processing, this compiler is able to perform just as well as the mature, production Halide compiler on Hexagon, and outperform on x86 by 8% and ARM by 3%. Hydride also outperforms the production Halide's LLVM back end by 12% on x86, 100% on HVX, and 26% on ARM across the same kernels.

References

  1. Maaz Bin Safeer Ahmad, Alexander J Root, Andrew Adams, Shoaib Kamil, and Alvin Cheung. Vector instruction selection for digital signal processors using program synthesis. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 1004--1016, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM. ARM Developer Intrinsics. https://developer.arm.com/architectures/instruction-sets/intrinsics/f:@navigationhierarchiessimdisa=[Neon].Google ScholarGoogle Scholar
  3. Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E Gray, Robert Norton-Wright, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, et al. Isa semantics for armv8-a, risc-v, and cheri-mips. 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sorav Bansal and Alex Aiken. Automatic generation of peephole super-optimizers. ACM SIGARCH Computer Architecture News, 34(5):394--403, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sebastian Buchwald, Andreas Fried, and Sebastian Hack. Synthesizing an instruction selection rule library from semantic specifications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, pages 300--313, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. RG Cattell. Automatic derivation of code generators from machine descriptions. ACM Transactions on Programming Languages and Systems (TOPLAS), 2(2):173--190, 1980.Google ScholarGoogle Scholar
  7. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18), pages 578--594, 2018.Google ScholarGoogle Scholar
  8. Yishen Chen, Charith Mendis, Michael Carbin, and Saman Amarasinghe. Vegen: a vectorizer generator for simd and beyond. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 902--914, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lucian Codrescu. Architecture of the hexagon™ 680 dsp for mobile imaging and computer vision. In 2015 IEEE Hot Chips 27 Symposium (HCS), pages 1--26. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  10. Meghan Cowan, Deeksha Dangwal, Armin Alaghi, Caroline Trippel, Vincent T Lee, and Brandon Reagen. Porcupine: A synthesizing compiler for vectorized homomorphic encryption. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 375--389, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Halide. Halide. https://github.com/halide/Halide, 2021.Google ScholarGoogle Scholar
  12. Intel. Intel Deep Learning Boost. https://www.intel.com/content/dam/www/public/us/en/documents/product-overviews/dl-boost-product-overview.pdf, 2019.Google ScholarGoogle Scholar
  13. Intel. Intel Intrinsics Guide. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html, 2023.Google ScholarGoogle Scholar
  14. Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004., pages 75--86. IEEE, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhengyang Liu, Stefan Mada, and John Regehr. Minotaur: A simd-oriented synthesizing superoptimizer. arXiv preprint arXiv:2306.00229, 2023.Google ScholarGoogle Scholar
  16. Phitchaya Mangpo Phothilimthana, Aditya Thakur, Rastislav Bodik, and Dinakar Dhurjati. Scaling up superoptimization. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pages 297--310, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Qualcomm. Exploring the AI capabilities of the Qualcomm Snapdragon 888 Mobile Platform [video]. https://www.qualcomm.com/news/onq/2020/12/02/exploring-ai-capabilities-qualcomm-snapdragon-888-mobile-platform, 2020.Google ScholarGoogle Scholar
  18. Qualcomm. Qualcomm Hexagon V66 HVX Programmer's Reference Manual. https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual, 2022.Google ScholarGoogle Scholar
  19. Alexander J Root, Maaz Bin Safeer Ahmad, Dillon Sharlet, Andrew Adams, Shoaib Kamil, and Jonathan Ragan-Kelley. Fast instruction selection for fast digital signal processing. 2023.Google ScholarGoogle Scholar
  20. Alexander James Root. Optimizing Vector Instruction Selection for Digital Signal Processing. PhD thesis, Massachusetts Institute of Technology, 2022.Google ScholarGoogle Scholar
  21. Raimondas Sasnauskas, Yang Chen, Peter Collingbourne, Jeroen Ketema, Gratian Lup, Jubi Taneja, and John Regehr. Souper: A synthesizing superoptimizer. arXiv preprint arXiv:1711.04422, 2017.Google ScholarGoogle Scholar
  22. Armando Solar-Lezama, Christopher Grant Jones, and Rastislav Bodik. Sketching concurrent data structures. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, page 136--148, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tensorflow XLA Team. XLA: Optimizing Compiler for Machine Learning. https://www.tensorflow.org/xla, 2022.Google ScholarGoogle Scholar
  24. The LLVM Project. LLVM Language Reference Manual. https://llvm.org/docs/LangRef.html, 2022.Google ScholarGoogle Scholar
  25. Samuel Thomas and James Bornholt. Automatic generation of vectorizing compilers for customizable digital signal processors. 2024.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Emina Torlak and Rastislav Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software, pages 135--152, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Alexa VanHattum, Rachit Nigam, Vincent T Lee, James Bornholt, and Adrian Sampson. Vectorization for digital signal processors via equality saturation. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 874--886, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
    April 2024
    1299 pages
    ISBN:9798400703850
    DOI:10.1145/3620665

    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 April 2024

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate535of2,713submissions,20%
  • Article Metrics

    • Downloads (Last 12 months)64
    • Downloads (Last 6 weeks)64

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader