research-article

LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization

Authors:

Jin Lin,

Eric GarciaAuthors Info & Claims

LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC

Article No.: 4, Pages 1 - 11

https://doi.org/10.1145/3148173.3148191

Published: 12 November 2017 Publication History

Get Access

Abstract

With advances of modern multi-core processors and accelerators, many modern applications are increasingly turning to compiler-assisted parallel and vector programming models such as OpenMP, OpenCL, Halide, Python and TensorFlow. It is crucial to ensure that LLVM-based compilers can optimize parallel and vector code as effectively as possible. In this paper, we first present a set of updated LLVM IR extensions for explicitly parallel, vector, and offloading program constructs in the context of C/C++/OpenCL. Secondly, we describe our LLVM design and implementation for advanced features in OpenMP such as parallel loop reduction, task and taskloop, SIMD loop and functions, and we discuss the impact of our updated implementation on existing LLVM optimization passes. Finally, we present a re-use case of our infrastructure to enable explicit parallelization and vectorization extensions in our OpenCL compiler to achieve ~35x performance speedup for a well-known autonomous driving workload on a multi-core platform configured with Intel® Xeon® Scalable Processors.

References

[1]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--86, 2004.

Crossref

Google Scholar

[2]

X. Tian, M. Girkar, A. J.C. Bik, and H. Saito, "Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs," The Computer Journal, Oxford, Vol. 48, Issue 5, pps. 558--601, 2005.

Google Scholar

[3]

X. Tian, H. Saito, M. Girkar, S. Preis, S. Kozhukhov, A.G. Cherkasov, C. Nelson, N. Panchenko, R. Geva, Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors. In Proc. of IEEE 26th International Parallel and Distributed Processing Symposium - Multicore and GPU Prog. Models, Lang. and Compilers Workshop, pp. 2349--2358, 2012.

Google Scholar

[4]

OpenMP Architecture Review Board, "OpenMP Application Program Interface," v4.5, Oct. 2015, http://www.openmp.org

Google Scholar

[5]

J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In POPL '12, pages 427--440, 2012.

Digital Library

Google Scholar

[6]

Intel Corporation, LLVM Intrinsic function and Tag name string interface specitication for directive representation, April 12, 2017

Google Scholar

[7]

A. Zaks, et.al., "[llvm-dev] RFC: Extending LV to vectorize outerloops", Sept. 21, 2016, Intel Corporation.

Google Scholar

[8]

H. Finkel and X. Tian "[llvm-dev] RPC: A Proposal for adding an experimental IR-level region-annotation infrastructure, Jan. 11, 2017. http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html.

Google Scholar

[9]

H. Saito, et. al., "Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization", LLVM Developer's Conference, Nov. 2016

Google Scholar

[10]

X. Tian, et.al. "Proposal for function vectorization and loop vectorization with function calls", March 2, 2016. Intel Corp. http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html.

Google Scholar

[11]

F. Homm, N. Kaempchen, J. Ota and D. Burschka, "Efficient Occupancy Grid Computation on GPU with Lidar and Radar for Road Boundary Detection", In Proc. of IEEE Intelligent Vehicle Symposium, pp. 1006--1013 Universiry of California, San Diego, CA, USA, June 21-24, 2010.

Crossref

Google Scholar

[12]

X. Tian, H. Saito, E. Su, A. Gaba, M. Masten, E. Garcia, A. Zaks, "LLVM Framework and IR Extensions for Parallelization, SIMD Vectorization and Offloading". LLVM-HPC@SC 2016: 21--31.

Google Scholar

[13]

T.B. Schardl, W.S. Moses, C.E. Leiserson, "Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation", PPoPP'17, Feburary. 4-7, 2017, Austin, Texas, USA.

Digital Library

Google Scholar

Cited By

View all

Chen WLuo XCai HWang H(2024)Towards Smart Contract Fuzzing on GPUs2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00229(2255-2272)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00229
He YPodobas AMarkidis S(2024)Leveraging MLIR for Loop Vectorization and GPU Porting of FFT LibrariesEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_16(207-218)Online publication date: 16-Apr-2024
https://doi.org/10.1007/978-3-031-50684-0_16
Lai HLee JHwang Y(2023)Enhancing LLVM Optimizations for Linear Recurrence Programs on RVVProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605904(79-87)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605731.3605904
Show More Cited By

LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types

Recommendations

LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading
LLVM-HPC '16: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPC

LLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing software and tools. This paper presents a small set of LLVM IR extensions for explicitly parallel vector, and offloading ...
SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential ...
Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous ...

Comments

Information & Contributors

Information

Published In

LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC

November 2017

106 pages

ISBN:9781450355650

DOI:10.1145/3148173

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SC '17

Sponsor:

SIGHPC

SC '17: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 17, 2017

CO, Denver, USA

Acceptance Rates

LLVM-HPC'17 Paper Acceptance Rate 9 of 10 submissions, 90%;

Overall Acceptance Rate 16 of 22 submissions, 73%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
699
Total Downloads

Downloads (Last 12 months)114
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Chen WLuo XCai HWang H(2024)Towards Smart Contract Fuzzing on GPUs2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00229(2255-2272)Online publication date: 19-May-2024
https://doi.org/10.1109/SP54263.2024.00229
He YPodobas AMarkidis S(2024)Leveraging MLIR for Loop Vectorization and GPU Porting of FFT LibrariesEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_16(207-218)Online publication date: 16-Apr-2024
https://doi.org/10.1007/978-3-031-50684-0_16
Lai HLee JHwang Y(2023)Enhancing LLVM Optimizations for Linear Recurrence Programs on RVVProceedings of the 52nd International Conference on Parallel Processing Workshops10.1145/3605731.3605904(79-87)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3605731.3605904
He WGuo YTian XSaito HXing WZou FDai CZhao MYang H(2023)Streamline Ahead-of-Time SYCL CPU Device Implementation through Bypassing SPIR-VProceedings of the 2023 International Workshop on OpenCL10.1145/3585341.3585381(1-1)Online publication date: 18-Apr-2023
https://dl.acm.org/doi/10.1145/3585341.3585381
Moses WIvanov IDomke JEndo TDoerfert JZinenko ODehnavi MKulkarni MKrishnamoorthy S(2023)High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel ConstructsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577475(119-134)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577475
Brauckmann APolgreen EGrosser TO'Boyle M(2023)mlirSynth: Automatic, Retargetable Program Raising in Multi-Level IR using Program SynthesisProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00012(39-50)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1109/PACT58117.2023.00012
Devic ARai SSivasubramaniam AAkel AEilert SEno JSalapura VZahran MChong FTang L(2022)To PIM or not for emerging general purpose processing in DDR memory systemsProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527431(231-244)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527431
Doerfert JPatel AHuber JTian SDiaz JChapman BGeorgakoudis G(2022)Co-Designing an OpenMP GPU Runtime and Optimizations for Near-Zero Overhead Execution2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00055(504-514)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00055
Huber JCornelius MGeorgakoudis GTian SDiaz JDinel KChapman BDoerfert JLee J(2022)Efficient execution of OpenMP on GPUsProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741290(41-52)Online publication date: 2-Apr-2022
https://dl.acm.org/doi/10.1109/CGO53902.2022.9741290
Ma JWang WNelson ACuevas MHomerding BLiu CHuang ZCampanoni SHale KDinda Pde Supinski BHall MGamblin T(2021)Paths to OpenMP in the kernelProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476183(1-17)Online publication date: 14-Nov-2021
https://dl.acm.org/doi/10.1145/3458817.3476183
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Recommendations

LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading

SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading

SIMD parallel MCMC sampling with applications for big-data Bayesian analytics

Support OpenCL 2.0 Compiler on LLVM for PTX Simulators

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations