Structured Operations: Modular Design of Code Generators for Tensor Compilers

Vasilache, Nicolas; Zinenko, Oleksandr; Bik, Aart J. C.; Ravishankar, Mahesh; Raoux, Thomas; Belyaev, Alexander; Springer, Matthias; Gysi, Tobias; Caballero, Diego; Herhut, Stephan; Laurenzo, Stella; Cohen, Albert

doi:10.1007/978-3-031-31445-2_10

Structured Operations: Modular Design of Code Generators for Tensor Compilers

Conference paper
First Online: 10 May 2023

249 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13829))

Abstract

The performance of machine learning systems heavily relies on code generators tailored to tensor computations. We propose an approach to the design and implementation of such code generators leveraging the natural structure of tensor algebra and illustrating the progressive lowering of domain-specific abstractions in the MLIR infrastructure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
This is analogous to the design of struct in LLVM IR: %1 = insertvalue {f64, f32, i32} %0, f32 42.0, 1 defines a new value %1 that holds the same elements as %0 except for the element at position 1 that now holds 42.0.
2.
The operation also allows specifying sizes and strides, omitted for simplicity.
3.
Some transformations such as software pipelining remain naturally attached to loops.

References

Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers (2001)
Google Scholar
Chen, T., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 578–594. USENIX Association (2018). https://www.usenix.org/conference/osdi18/presentation/chen
Developers, I.: IREE (intermediate representation execution environment (2021). https://google.github.io/iree/
Hagedorn, B., Elliott, A.S., Barthels, H., Bodik, R., Grover, V.: Fireiron: a data-movement-aware scheduling language for gpus. In: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020, pp. 71–82. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3410463.3414632
Hagedorn, B., Lenfers, J., Koehler, T., Gorlatch, S., Steuwer, M.: A language for describing optimization strategies. CoRR abs/2002.02268 (2020). https://arxiv.org/abs/2002.02268
Hagedorn, B., Lenfers, J., Kundefinedhler, T., Qin, X., Gorlatch, S., Steuwer, M.: Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies. Proceedings of ACM on Programming Languages 4(ICFP) (Aug 2020). https://doi.org/10.1145/3408974
Kjolstad, F., Kamil, S., Chou, S., Lugato, D., Amarasinghe, S.: The tensor algebra compiler. Proc. ACM Program. Lang. 1(OOPSLA) (Oct 2017). https://doi.org/10.1145/3133901
Lattner, C., et al.: Mlir: scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14. IEEE/ACM, IEEE/ACM (2021). https://doi.org/10.1109/CGO51591.2021.9370308
Mullapudi, R.T., Vasista, V., Bondhugula, U.: Polymage: automatic optimization for image processing pipelines. In: Özturk, Ö., Ebcioglu, K., Dwarkadas, S. (eds.) Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2015, Istanbul, Turkey, March 14–18, 2015, pp. 429–443. ACM (2015). https://doi.org/10.1145/2694344.2694364
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48(6), 519–530 (2013). https://doi.org/10.1145/2499370.2462176
Rasch, A., Schulze, R., Gorlatch, S.: Generating portable high-performance code via multi-dimensional homomorphisms. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 354–369. IEEE, Seattle, WA (2019). https://doi.org/10.1109/PACT.2019.00035
Rotem, N., et al.: Glow: graph lowering compiler techniques for neural networks. CoRR abs/1805.00907 (2018). https://arxiv.org/abs/1805.00907
Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons (1986)
Google Scholar
Smith, G.H., et al.: Pure tensor program rewriting via access patterns (representation pearl). CoRR abs/2105.09377 (2021). https://arxiv.org/abs/2105.09377
Steuwer, M., Remmelg, T., Dubach, C.: Lift: a functional data-parallel ir for high-performance gpu code generation. In: 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 74–85. IEEE/ACM (2017). https://doi.org/10.1109/CGO.2017.7863730
Tillet, P., Kung, H.T., Cox, D.: Triton: an intermediate language and compiler for tiled neural network computations. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2019, pp. 10–19. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3315508.3329973
Vasilache, N., et al.: Composable and modular code generation in MLIR: a structured and retargetable approach to tensor compiler construction. CoRR abs/2202.03293 (2022). https://arxiv.org/abs/2202.03293
Vasilache, N., et al.: The next 700 accelerated layers: from mathematical expressions of network computation graphs to accelerated gpu kernels, automatically. ACM Trans. Architecture Code Optim. (TACO) 16(4), 1–26 (2019). https://doi.org/10.1145/3355606
XLA team within Google: XLA: TensorFlow, Compiled. Google Developers Blog (2017). https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html
Zheng, L., et al.: Ansor: generating high-performance tensor programs for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, 4–6 November, 2020, pp. 863–879. USENIX Association (2020). https://www.usenix.org/conference/osdi20/presentation/zheng

Download references

Author information

Authors and Affiliations

Google, Zürich, Switzerland
Nicolas Vasilache, Matthias Springer & Tobias Gysi
Google, Paris, France
Oleksandr Zinenko & Albert Cohen
Google, Sunnyvale, USA
Aart J. C. Bik & Thomas Raoux
Google, Seattle, USA
Mahesh Ravishankar & Stella Laurenzo
Google, Munich, Germany
Alexander Belyaev & Stephan Herhut
Google, San Diego, USA
Diego Caballero

Authors

Nicolas Vasilache
View author publications
You can also search for this author in PubMed Google Scholar
Oleksandr Zinenko
View author publications
You can also search for this author in PubMed Google Scholar
Aart J. C. Bik
View author publications
You can also search for this author in PubMed Google Scholar
Mahesh Ravishankar
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Raoux
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Belyaev
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Springer
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Gysi
View author publications
You can also search for this author in PubMed Google Scholar
Diego Caballero
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Herhut
View author publications
You can also search for this author in PubMed Google Scholar
Stella Laurenzo
View author publications
You can also search for this author in PubMed Google Scholar
Albert Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Albert Cohen .

Editor information

Editors and Affiliations

University of Illinois Urbana-Champaign, Urbana, IL, USA
Charith Mendis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Lawrence Rauchwerger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vasilache, N. et al. (2023). Structured Operations: Modular Design of Code Generators for Tensor Compilers. In: Mendis, C., Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2022. Lecture Notes in Computer Science, vol 13829. Springer, Cham. https://doi.org/10.1007/978-3-031-31445-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-31445-2_10
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31444-5
Online ISBN: 978-3-031-31445-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics