Skip to main content

Advertisement

Log in

Building a domain-specific compiler for emerging processors with a reusable approach

  • Research Paper
  • From CAS & CAE Members
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

High-performance computing and deep learning domains have been motivating the design of domain-specific processors. Although these processors can provide promising computation capability, they are notorious for exotic programming paradigms. To improve programming productivity and fully exploit the performance potential of these processors, domain-specific compilers (DSCs) have been proposed. However, building DSCs for emerging processors requires tremendous engineering efforts because the commonly used compilation stack is difficult to be reused. Owing to the advent of multilevel intermediate representation (MLIR), DSC developers can leverage reusable infrastructure to extend their customized functionalities without rebuilding the entire compilation stack. In this paper, we further demonstrate the effectiveness of MLIR by extending its reusable infrastructure to embrace a heterogeneous many-core processor (Sunway processor). In particular, we design a new Sunway dialect and corresponding backend for the Sunway processor, fully exploiting its architectural advantage and hiding its programming complexity. To show the ease of building a DSC, we leverage the Sunway dialect and existing MLIR dialects to build a stencil compiler for the Sunway processor. The experimental results show that our stencil compiler, built with a reusable approach, can even perform better than state-of-the-art stencil compilers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Fu H H, Liao J F, Yang J Z, et al. The Sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci, 2016, 59: 072001

    Article  Google Scholar 

  2. Li M Z, Liu Y, Liu X Y, et al. The deep learning compiler: a comprehensive survey. IEEE Trans Parallel Distrib Syst, 2021, 32: 708–727

    Article  Google Scholar 

  3. Leary C, Wang T. XLA: TensorFlow, compiled. TensorFlow Dev Summit, 2017. https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html

  4. Chen T Q, Moreau T, Jiang Z H, et al. TVM: an automated end-to-end optimizing compiler for deep learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation, Carlsbad, 2018. 578–594

  5. Bondhugula U, Hartono A, Ramanujam J, et al. A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, 2008. 101–113

  6. Gysi T, Osuna C, Fuhrer O, et al. STELLA: a domain-specific tool for structured grid methods in weather and climate models. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, 2015. 1–12

  7. Lattner C, Adve V. LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of International Symposium on Code Generation and Optimization, San Jose, 2004. 75–86

  8. Lattner C, Amini M, Bondhugula U, et al. MLIR: scaling compiler infrastructure for domain specific computation. In: Proceedings of International Symposium on Code Generation and Optimization, Seoul, 2021. 2–14

  9. Vasilache N, Zinenko O, Bik A J C. Composable and modular code generation in MLIR: a structured and retargetable approach to tensor compiler construction. 2022. ArXiv:2202.03293

  10. Bik A J C, Koanantakool P, Shpeisman T, et al. Compiler support for sparse tensor computations in MLIR. ACM Trans Archit Code Optim, 2022, 19: 1–25

    Article  Google Scholar 

  11. Tian R Q, Guo L Z, Li, J J, et al. A high performance sparse tensor algebra compiler in MLIR. In: Proceedings of Workshop on the LLVM Compiler Infrastructure in HPC, St. Louis, 2021. 27–38

  12. Jeong G, Kestor G, Chatarasi P, et al. Union: a unified HW-SW co-design ecosystem in MLIR for evaluating tensor operations on spatial accelerators. In: Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, Atlanta, 2021. 30–44

  13. Li M Z, Liu Y, Hu Y M, et al. Automatic code generation and optimization of large-scale stencil computation on many-core processors. In: Proceedings of the International Conference on Parallel Processing, Lemont, 2021. 1–12

  14. Yang C, Xue W, Fu H H, et al. 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, 2016. 57–68

  15. Chen B W, Fu H H, Wei Y W, et al. Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. 517–528

  16. Duan X H, Gao P, Zhang T J, et al. Redesigning LAMMPS for peta-scale and hundred-billion-atom simulation on Sunway TaihuLight. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. 148–159

  17. Liu Y, Liu X, Li F, et al. Closing the “Quantum Supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, New York, 2021. 1–12

  18. Liu C X, Xie B W, Liu X, et al. Towards efficient SpMV on Sunway Manycore architectures. In: Proceedings of the International Conference on Supercomputing, Beijing, 2018. 363–373

  19. Li M Z, Liu Y, Yang H L, et al. Multi-role SpTRSV on Sunway many-core architecture. In: Proceedings of International Conference on High Performance Computing and Communications, Exeter, 2018. 594–601

  20. Wang X L, Liu W F, Xue W, et al. SwSpTRSV: a fast sparse triangular solve with sparse level tile layout on Sunway architectures. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Vienna, 2018. 338–353

  21. Li M Z, Liu Y, Yang H L, et al. Accelerating sparse Cholesky factorization on Sunway Manycore architecture. IEEE Trans Parallel Distrib Syst, 2020, 31: 1636–1650

    Article  Google Scholar 

  22. Fang J R, Fu H H, Zhao W L, et al. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. In: Proceedings of International Parallel and Distributed Processing Symposium, Orlando, 2017. 615–624

  23. Han Q C, Yang H L, Dun M, et al. Towards efficient tile low-rank GEMM computation on Sunway many-core processors. J Supercomput, 2021, 77: 4533–4564

    Article  Google Scholar 

  24. Zhong X G, Li M Z, Yang H L, et al. swMR: a framework for accelerating MapReduce applications on Sunway Taihulight. IEEE Trans Emerg Top Comput, 2018, 9: 1020–1030

    Article  Google Scholar 

  25. Zerrell T, Bruestle J. Stripe: tensor compilation via the nested polyhedral model. 2019. ArXiv:1903.06498

  26. Jin T, Bercea G T, Le T D. Compiling ONNX neural network models using MLIR. 2020. ArXiv:2008.08272

  27. Katel N, Khandelwal V, Bondhugula U. High performance GPU code generation for matrix-matrix multiplication using MLIR: some early results. 2021. ArXiv:2108.13191

  28. Komisarczyk K, Chelini L, Vadivel K, et al. PET-to-MLIR: a polyhedral front-end for MLIR. In: Proceedings of Euromicro Conference on Digital System Design, Kranj, 2020. 551–556

  29. Majumder K, Bondhugula U. HIR: an MLIR-based intermediate representation for hardware accelerator description. 2021. ArXiv:2103.00194

  30. Zhao R Z, Cheng J Y. Phism: polyhedral high-level synthesis in MLIR. 2021. ArXiv:2103.15103

  31. Yount C, Tobin J, Breuer A, et al. YASK-Yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: Proceedings of International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, 2016. 30–39

  32. Maruyama N, Nomura T, Sato K, et al. Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis, Seattle, 2011. 1–12

  33. Ragan-Kelley J, Barnes C, Adams A, et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Seattle, 2013. 519–530

  34. Rawat P S, Vaidya M, Sukumaran-Rajam A, et al. On optimizing complex stencils on GPUs. In: Proceedings of International Parallel and Distributed Processing Symposium, Rio de Janeiro, 2019. 641–652

  35. Hagedorn B, Stoltzfus L, Steuwer M, et al. High performance stencil code generation with lift. In: Proceedings of the International Symposium on Code Generation and Optimization, Vienna, 2018. 100–112

  36. Ansel J, Kamil S, Veeramachaneni K, et al. OpenTuner: an extensible framework for program autotuning. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Edmonton, 2014. 303–316

  37. Sun Q X, Liu Y, Yang H L, et al. csTuner: scalable auto-tuning framework for complex stencil computation on GPUs. In: Proceedings of International Conference on Cluster Computing, Portland, 2021. 192–203

  38. Gysi T, Müller C, Zinenko O, et al. Domain-specific multi-level IR rewriting for GPU: the open earth compiler for GPU-accelerated climate simulation. ACM Trans Archit Code Optim, 2021, 18: 1–23

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Key Research and Development Program of China (Grant No. 2020YFB1506703), National Natural Science Foundation of China (Grant Nos. 62072018, 61732002, U22A2028), State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2021ZX-06), and Fundamental Research Funds for the Central Universities (Grant No. YWF-22-L-1127).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hailong Yang or Depei Qian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Liu, Y., Chen, B. et al. Building a domain-specific compiler for emerging processors with a reusable approach. Sci. China Inf. Sci. 67, 112101 (2024). https://doi.org/10.1007/s11432-022-3727-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-022-3727-6

Keywords

Navigation