research-article

EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers

Authors:

Ping Xu,

Jun Liu,

Chao YangAuthors Info & Claims

ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Article No.: 54, Pages 1 - 11

https://doi.org/10.1145/3545008.3545037

Published: 13 January 2023 Publication History

Get Access

Abstract

In recent years, memory-intensive operations are becoming dominant in efficiency of running novel neural networks. Just-in-time operator fusion on accelerating devices like GPU proves an effective method for optimizing memory-intensive operations, and suits the numerous varying model structures. In particular, we find memory-intensive operations on tensor views are ubiquitous in neural network implementations. Tensors are the de facto representation for numerical data in deep learning areas, while tensor views cover a bunch of sophisticated syntax, which allow various interpretations on the underlying tensor data without memory copy. The support of views in deep learning compilers could greatly enlarge operator fusion scope, and appeal to optimizing novel neural networks. Nevertheless, mainstream solutions in state-of-the-art deep learning compilers exhibit imperfections either in view syntax representations or operator fusion. In this article, we propose EasyView, which enables and schedules tensor views in an end-to-end workflow from neural networks onto devices. Aiming at maximizing memory utilization and reducing data movement, we categorize various view contexts in high-level language, and lower views in accordance with different scenarios. Reference-semantic in terms of views are kept in the lowering from native high-level language features to intermediate representations. Based on the reserved reference-semantics, memory activities related to data dependence of read and write are tracked for further compute and memory optimization. Besides, ample operator fusion is applied to memory-intensive operations with views. In our tests, the proposed work could get average 5.63X, 2.44X, and 4.67X speedup compared with the XLA, JAX, and TorchScript, respectively for hotspot Python functions. In addition, operation fusion with views could bring 8.02% performance improvement in end-to-end neural networks.

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265–283.

Abstract

References

Cited By

Index Terms

Recommendations

DNNFusion: accelerating deep neural networks execution with advanced operator fusion

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations