Abstract:
Vision Transformer (ViT) has gained prominence for its performance in various vision tasks but comes with considerable computational and memory demands, posing a challe...Show MoreMetadata
Abstract:
Vision Transformer (ViT) has gained prominence for its performance in various vision tasks but comes with considerable computational and memory demands, posing a challenge when deploying it on resource-constrained edge devices. To address this limitation, various token pruning methods have been proposed to reduce the computation. However, the majority of token pruning techniques do not account for practical use in actual embedded devices, which demand a significant reduction in computational load. In this paper, we introduce ViT-ToGo, a ViT accelerator with grouped token pruning. This enables the parallel execution of the ViT models and the token pruning process. We implement grouped token pruning with a head-wise importance estimator which simplifies the process need for token pruning, including sorting and reordering. Our proposed method achieves up to 66 % reduction in the number of tokens, resulting in up to 36% reduction in GFLOPs, with only a minimal accuracy drop of around 1 %. Furthermore, the hardware implementation incurs a marginal resource overhead of 1.13% in average.
Date of Conference: 25-27 March 2024
Date Added to IEEE Xplore: 10 June 2024
ISBN Information: