Building a domain-specific compiler for emerging processors with a reusable approach

Li, Mingzhen; Liu, Yi; Chen, Bangduo; Yang, Hailong; Luan, Zhongzhi; Qian, Depei

doi:10.1007/s11432-022-3727-6

Building a domain-specific compiler for emerging processors with a reusable approach

Research Paper
From CAS & CAE Members
Published: 27 December 2023

Volume 67, article number 112101, (2024)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Mingzhen Li^1,2,
Yi Liu²,
Bangduo Chen²,
Hailong Yang^1,2,
Zhongzhi Luan² &
…
Depei Qian²

298 Accesses
Explore all metrics

Abstract

High-performance computing and deep learning domains have been motivating the design of domain-specific processors. Although these processors can provide promising computation capability, they are notorious for exotic programming paradigms. To improve programming productivity and fully exploit the performance potential of these processors, domain-specific compilers (DSCs) have been proposed. However, building DSCs for emerging processors requires tremendous engineering efforts because the commonly used compilation stack is difficult to be reused. Owing to the advent of multilevel intermediate representation (MLIR), DSC developers can leverage reusable infrastructure to extend their customized functionalities without rebuilding the entire compilation stack. In this paper, we further demonstrate the effectiveness of MLIR by extending its reusable infrastructure to embrace a heterogeneous many-core processor (Sunway processor). In particular, we design a new Sunway dialect and corresponding backend for the Sunway processor, fully exploiting its architectural advantage and hiding its programming complexity. To show the ease of building a DSC, we leverage the Sunway dialect and existing MLIR dialects to build a stencil compiler for the Sunway processor. The experimental results show that our stencil compiler, built with a reusable approach, can even perform better than state-of-the-art stencil compilers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Fu H H, Liao J F, Yang J Z, et al. The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci, 2016, 59: 072001
Article Google Scholar
Li M Z, Liu Y, Liu X Y, et al. The deep learning compiler: a comprehensive survey. IEEE Trans Parallel Distrib Syst, 2021, 32: 708–727
Article Google Scholar
Leary C, Wang T. XLA: TensorFlow, compiled. TensorFlow Dev Summit, 2017. https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html
Chen T Q, Moreau T, Jiang Z H, et al. TVM: an automated end-to-end optimizing compiler for deep learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, 2018. 578–594
Bondhugula U, Hartono A, Ramanujam J, et al. A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, 2008. 101–113
Gysi T, Osuna C, Fuhrer O, et al. STELLA: a domain-specific tool for structured grid methods in weather and climate models. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, 2015. 1–12
Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of International Symposium on Code Generation and Optimization, San Jose, 2004. 75–86
Lattner C, Amini M, Bondhugula U, et al. MLIR: scaling compiler infrastructure for domain specific computation. In: Proceedings of International Symposium on Code Generation and Optimization, Seoul, 2021. 2–14
Vasilache N, Zinenko O, Bik A J C. Composable and modular code generation in MLIR: a structured and retargetable approach to tensor compiler construction. 2022. ArXiv:2202.03293
Bik A J C, Koanantakool P, Shpeisman T, et al. Compiler support for sparse tensor computations in MLIR. ACM Trans Archit Code Optim, 2022, 19: 1–25
Article Google Scholar
Tian R Q, Guo L Z, Li, J J, et al. A high performance sparse tensor algebra compiler in MLIR. In: Proceedings of Workshop on the LLVM Compiler Infrastructure in HPC, St. Louis, 2021. 27–38
Jeong G, Kestor G, Chatarasi P, et al. Union: a unified HW-SW co-design ecosystem in MLIR for evaluating tensor operations on spatial accelerators. In: Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, Atlanta, 2021. 30–44
Li M Z, Liu Y, Hu Y M, et al. Automatic code generation and optimization of large-scale stencil computation on many-core processors. In: Proceedings of the International Conference on Parallel Processing, Lemont, 2021. 1–12
Yang C, Xue W, Fu H H, et al. 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, 2016. 57–68
Chen B W, Fu H H, Wei Y W, et al. Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. 517–528
Duan X H, Gao P, Zhang T J, et al. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. 148–159
Liu Y, Liu X, Li F, et al. Closing the “Quantum Supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, New York, 2021. 1–12
Liu C X, Xie B W, Liu X, et al. Towards efficient SpMV on Sunway Manycore architectures. In: Proceedings of the International Conference on Supercomputing, Beijing, 2018. 363–373
Li M Z, Liu Y, Yang H L, et al. Multi-role SpTRSV on Sunway many-core architecture. In: Proceedings of International Conference on High Performance Computing and Communications, Exeter, 2018. 594–601
Wang X L, Liu W F, Xue W, et al. SwSpTRSV: a fast sparse triangular solve with sparse level tile layout on Sunway architectures. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Vienna, 2018. 338–353
Li M Z, Liu Y, Yang H L, et al. Accelerating sparse Cholesky factorization on Sunway Manycore architecture. IEEE Trans Parallel Distrib Syst, 2020, 31: 1636–1650
Article Google Scholar
Fang J R, Fu H H, Zhao W L, et al. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. In: Proceedings of International Parallel and Distributed Processing Symposium, Orlando, 2017. 615–624
Han Q C, Yang H L, Dun M, et al. Towards efficient tile low-rank GEMM computation on Sunway many-core processors. J Supercomput, 2021, 77: 4533–4564
Article Google Scholar
Zhong X G, Li M Z, Yang H L, et al. swMR: a framework for accelerating MapReduce applications on Sunway Taihulight. IEEE Trans Emerg Top Comput, 2018, 9: 1020–1030
Article Google Scholar
Zerrell T, Bruestle J. Stripe: tensor compilation via the nested polyhedral model. 2019. ArXiv:1903.06498
Jin T, Bercea G T, Le T D. Compiling ONNX neural network models using MLIR. 2020. ArXiv:2008.08272
Katel N, Khandelwal V, Bondhugula U. High performance GPU code generation for matrix-matrix multiplication using MLIR: some early results. 2021. ArXiv:2108.13191
Komisarczyk K, Chelini L, Vadivel K, et al. PET-to-MLIR: a polyhedral front-end for MLIR. In: Proceedings of Euromicro Conference on Digital System Design, Kranj, 2020. 551–556
Majumder K, Bondhugula U. HIR: an MLIR-based intermediate representation for hardware accelerator description. 2021. ArXiv:2103.00194
Zhao R Z, Cheng J Y. Phism: polyhedral high-level synthesis in MLIR. 2021. ArXiv:2103.15103
Yount C, Tobin J, Breuer A, et al. YASK-Yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: Proceedings of International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, 2016. 30–39
Maruyama N, Nomura T, Sato K, et al. Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, Seattle, 2011. 1–12
Ragan-Kelley J, Barnes C, Adams A, et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Seattle, 2013. 519–530
Rawat P S, Vaidya M, Sukumaran-Rajam A, et al. On optimizing complex stencils on GPUs. In: Proceedings of International Parallel and Distributed Processing Symposium, Rio de Janeiro, 2019. 641–652
Hagedorn B, Stoltzfus L, Steuwer M, et al. High performance stencil code generation with lift. In: Proceedings of the International Symposium on Code Generation and Optimization, Vienna, 2018. 100–112
Ansel J, Kamil S, Veeramachaneni K, et al. OpenTuner: an extensible framework for program autotuning. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Edmonton, 2014. 303–316
Sun Q X, Liu Y, Yang H L, et al. csTuner: scalable auto-tuning framework for complex stencil computation on GPUs. In: Proceedings of International Conference on Cluster Computing, Portland, 2021. 192–203
Gysi T, Müller C, Zinenko O, et al. Domain-specific multi-level IR rewriting for GPU: the open earth compiler for GPU-accelerated climate simulation. ACM Trans Archit Code Optim, 2021, 18: 1–23
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Grant No. 2020YFB1506703), National Natural Science Foundation of China (Grant Nos. 62072018, 61732002, U22A2028), State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2021ZX-06), and Fundamental Research Funds for the Central Universities (Grant No. YWF-22-L-1127).

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beijing, 100191, China
Mingzhen Li & Hailong Yang
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Mingzhen Li, Yi Liu, Bangduo Chen, Hailong Yang, Zhongzhi Luan & Depei Qian

Authors

Mingzhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bangduo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hailong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Luan
View author publications
You can also search for this author in PubMed Google Scholar
Depei Qian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hailong Yang or Depei Qian.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, M., Liu, Y., Chen, B. et al. Building a domain-specific compiler for emerging processors with a reusable approach. Sci. China Inf. Sci. 67, 112101 (2024). https://doi.org/10.1007/s11432-022-3727-6

Download citation

Received: 01 April 2022
Revised: 28 November 2022
Accepted: 17 February 2023
Published: 27 December 2023
DOI: https://doi.org/10.1007/s11432-022-3727-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building a domain-specific compiler for emerging processors with a reusable approach

Abstract

Access this article

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation