Skip to main content

Structured Operations: Modular Design of Code Generators for Tensor Compilers

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13829))

Abstract

The performance of machine learning systems heavily relies on code generators tailored to tensor computations. We propose an approach to the design and implementation of such code generators leveraging the natural structure of tensor algebra and illustrating the progressive lowering of domain-specific abstractions in the MLIR infrastructure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    This is analogous to the design of struct in LLVM IR: %1 = insertvalue {f64, f32, i32} %0, f32 42.0, 1 defines a new value %1 that holds the same elements as %0 except for the element at position 1 that now holds 42.0.

  2. 2.

    The operation also allows specifying sizes and strides, omitted for simplicity.

  3. 3.

    Some transformations such as software pipelining remain naturally attached to loops.

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers (2001)

    Google Scholar 

  2. Chen, T., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 578–594. USENIX Association (2018). https://www.usenix.org/conference/osdi18/presentation/chen

  3. Developers, I.: IREE (intermediate representation execution environment (2021). https://google.github.io/iree/

  4. Hagedorn, B., Elliott, A.S., Barthels, H., Bodik, R., Grover, V.: Fireiron: a data-movement-aware scheduling language for gpus. In: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020, pp. 71–82. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3410463.3414632

  5. Hagedorn, B., Lenfers, J., Koehler, T., Gorlatch, S., Steuwer, M.: A language for describing optimization strategies. CoRR abs/2002.02268 (2020). https://arxiv.org/abs/2002.02268

  6. Hagedorn, B., Lenfers, J., Kundefinedhler, T., Qin, X., Gorlatch, S., Steuwer, M.: Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies. Proceedings of ACM on Programming Languages 4(ICFP) (Aug 2020). https://doi.org/10.1145/3408974

  7. Kjolstad, F., Kamil, S., Chou, S., Lugato, D., Amarasinghe, S.: The tensor algebra compiler. Proc. ACM Program. Lang. 1(OOPSLA) (Oct 2017). https://doi.org/10.1145/3133901

  8. Lattner, C., et al.: Mlir: scaling compiler infrastructure for domain specific computation. In: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 2–14. IEEE/ACM, IEEE/ACM (2021). https://doi.org/10.1109/CGO51591.2021.9370308

  9. Mullapudi, R.T., Vasista, V., Bondhugula, U.: Polymage: automatic optimization for image processing pipelines. In: Özturk, Ö., Ebcioglu, K., Dwarkadas, S. (eds.) Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2015, Istanbul, Turkey, March 14–18, 2015, pp. 429–443. ACM (2015). https://doi.org/10.1145/2694344.2694364

  10. Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices 48(6), 519–530 (2013). https://doi.org/10.1145/2499370.2462176

  11. Rasch, A., Schulze, R., Gorlatch, S.: Generating portable high-performance code via multi-dimensional homomorphisms. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 354–369. IEEE, Seattle, WA (2019). https://doi.org/10.1109/PACT.2019.00035

  12. Rotem, N., et al.: Glow: graph lowering compiler techniques for neural networks. CoRR abs/1805.00907 (2018). https://arxiv.org/abs/1805.00907

  13. Schrijver, A.: Theory of Linear and Integer Programming. John Wiley & Sons (1986)

    Google Scholar 

  14. Smith, G.H., et al.: Pure tensor program rewriting via access patterns (representation pearl). CoRR abs/2105.09377 (2021). https://arxiv.org/abs/2105.09377

  15. Steuwer, M., Remmelg, T., Dubach, C.: Lift: a functional data-parallel ir for high-performance gpu code generation. In: 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 74–85. IEEE/ACM (2017). https://doi.org/10.1109/CGO.2017.7863730

  16. Tillet, P., Kung, H.T., Cox, D.: Triton: an intermediate language and compiler for tiled neural network computations. In: Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2019, pp. 10–19. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3315508.3329973

  17. Vasilache, N., et al.: Composable and modular code generation in MLIR: a structured and retargetable approach to tensor compiler construction. CoRR abs/2202.03293 (2022). https://arxiv.org/abs/2202.03293

  18. Vasilache, N., et al.: The next 700 accelerated layers: from mathematical expressions of network computation graphs to accelerated gpu kernels, automatically. ACM Trans. Architecture Code Optim. (TACO) 16(4), 1–26 (2019). https://doi.org/10.1145/3355606

  19. XLA team within Google: XLA: TensorFlow, Compiled. Google Developers Blog (2017). https://developers.googleblog.com/2017/03/xla-tensorflow-compiled.html

  20. Zheng, L., et al.: Ansor: generating high-performance tensor programs for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020, Virtual Event, 4–6 November, 2020, pp. 863–879. USENIX Association (2020). https://www.usenix.org/conference/osdi20/presentation/zheng

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Albert Cohen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vasilache, N. et al. (2023). Structured Operations: Modular Design of Code Generators for Tensor Compilers. In: Mendis, C., Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2022. Lecture Notes in Computer Science, vol 13829. Springer, Cham. https://doi.org/10.1007/978-3-031-31445-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31445-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31444-5

  • Online ISBN: 978-3-031-31445-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics