BinaryFormer: A Hierarchical-Adaptive Binary Vision Transformer (ViT) for Efficient Computing | IEEE Journals & Magazine | IEEE Xplore

BinaryFormer: A Hierarchical-Adaptive Binary Vision Transformer (ViT) for Efficient Computing


Abstract:

Vision Transformer (ViT) has recently demonstrated impressive nonlinear modeling capabilities and achieved state-of-the-art performance in various industrial applications...Show More

Abstract:

Vision Transformer (ViT) has recently demonstrated impressive nonlinear modeling capabilities and achieved state-of-the-art performance in various industrial applications, such as object recognition, anomaly detection, and robot control. However, their practical deployment can be hindered by high storage requirements and computational intensity. To alleviate these challenges, we propose a binary transformer called BinaryFormer, which quantizes the learned weights of the ViT module from 32-b precision to 1 b. Furthermore, we propose a hierarchical-adaptive architecture that replaces expensive matrix operations with more affordable addition and bit operations by switching between two attention modes. As a result, BinaryFormer is able to effectively compress the model size as well as reduce the computation cost of ViT. Experimental results on the ImageNet-1K benchmark datasets show that BinaryFormer reduces the size of a typical ViT model by an average of 27.7× and converts over 99% of multiplication operations into bit operations while maintaining reasonable accuracy.
Published in: IEEE Transactions on Industrial Informatics ( Volume: 20, Issue: 8, August 2024)
Page(s): 10657 - 10668
Date of Publication: 15 May 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.