Vision Transformer Inference on a CNN Accelerator | IEEE Conference Publication | IEEE Xplore

Vision Transformer Inference on a CNN Accelerator


Abstract:

Following the remarkable performance demonstrated by the Transformer architecture in the field of computer vision as well as natural language processing (NLP), there is a...Show More

Abstract:

Following the remarkable performance demonstrated by the Transformer architecture in the field of computer vision as well as natural language processing (NLP), there is a growing demand for embedded systems capable of executing Vision Transformer (ViT) applications as well as Convolutional Neural Network (CNN) applications efficiently. Since CNN accelerators are already widely used commercially, this paper explores the possibility of using existing CNN accelerators to support ViT rather than developing separate accelerators for each. CNN accelerators inherently have some limitations in efficiently handling operations in transformers: matrix multiplication (MM) operations with two non-constant matrices and nonlinear operations. To overcome these limitations, we first propose a novel technique to efficiently handle MM operations without special reshaping hardware in an adder-tree type CNN accelerator. And we propose an optimal scheduling method to minimize the idle time caused by offloading computation of nonlinear operations of the Transformer. Additionally, we investigate the possibility of executing layer normalization and GELU operations on the accelerator with minor extensions. The experimental results validate the effectiveness of the proposed methods.
Date of Conference: 18-20 November 2024
Date Added to IEEE Xplore: 02 January 2025
ISBN Information:

ISSN Information:

Conference Location: Milan, Italy

Contact IEEE to Subscribe

References

References is not available for this document.