research-article

STCO: Enhancing Training Efficiency via Structured Sparse Tensor Compilation Optimization

Authors:

Tian Li,

Li JiangAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 30, Issue 1

Article No.: 8, Pages 1 - 22

https://doi.org/10.1145/3701033

Published: 09 November 2024 Publication History

Get Access

Abstract

Network sparsification serves as an effective technique to accelerate Deep Neural Network (DNN) inference. However, existing sparsification techniques often rely on structured sparsity, which yields limited benefits. This is primarily due to the significant memory and computational overhead introduced by numerous sparse storage formats during address generation and gradient updates. Additionally, many of these solutions are tailored solely for the inference phase, neglecting the crucial training phase.

In this article, we introduce STCO, a novel Sparse Tensor Compilation Optimization technique that significantly enhances training efficiency through structured sparse tensor compilation. Central to STCO is the Tensorization-aware Index Entity (TIE) format, which effectively represents structured sparse tensors by eliminating redundant indices and minimizing storage overhead. The TIE format plays a pivotal role in the Address-carry flow (AC flow) pass, which optimizes the data layout at the computational graph level. This pass leverages the TIE format to enhance the efficiency of tensor representations, enabling more compact and efficient sparse tensor storage. Meanwhile, a shape inference pass utilizes the AC flow to derive optimized tensor shapes, further refining the performance of sparse tensor operations. Moreover, the Address-Carry TIE Flow dynamically tracks nonzero addresses, extending the benefits of sparse optimization to both forward and backward propagation. This seamless integration into the training pipeline enables a smooth transition to sparse tensor compilation without significant modifications to existing codebases. To further boost training performance, we implement an operator-level AC flow optimization pass tailored for structured sparse tensors. This pass generates efficient addresses, ensuring minimal computational overhead during sparse tensor operations. The flexibility of STCO allows it to be efficiently integrated into various frameworks or compilers, providing a robust solution for enhancing training efficiency with structured sparse tensors. Experiments demonstrated that STCO achieved impressive speedups of 3.64×, 5.43×, 4.89×, and 3.91× when compared to state-of-the-art sparse formats on VGG16, ResNet-18, MobileNetV1, and MobileNetV2, respectively. These findings underscore the efficiency and superiority of our proposed approach in leveraging unstructured sparsity for DNN inference acceleration.

References

[1]

2021. cuSPARSELt: A high-performance CUDA library for sparse matrix-matrix multiplication. Retrieved 10 April 2022 fromhttps://docs.nvidia.com/cuda/cusparselt/index.html

Abstract

References

Index Terms

Recommendations

TSTC: Enabling Efficient Training via Structured Sparse Tensor Compilation

On recovery of sparse signals via l1 minimization

Low-rank sparse fully-connected tensor network for tensor completion

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations

On recovery of sparse signals via l₁ minimization