Loading [a11y]/accessibility-menu.js
Deformable Modules for Flexible Feature Sampling on Vision Transformer | IEEE Conference Publication | IEEE Xplore

Deformable Modules for Flexible Feature Sampling on Vision Transformer


Abstract:

Vision transformers have shown that the self-attention mechanism performs well in the computer vision field. However, since such transformers are based on data sampled fr...Show More

Abstract:

Vision transformers have shown that the self-attention mechanism performs well in the computer vision field. However, since such transformers are based on data sampled from fixed areas, there is a limit to efficiently learning the important features in images. To compensate, we propose two modules based on the deformable operation: deformable patch embedding and deformable pooling. Deformable patch embedding consists of a hybrid structure of standard and deformable convolutions, and adaptively samples features from an image. The deformable pooling module also has a similar structure to the embedding module, but it not only samples data flexibly after self-attention but also allows the transformer to learn spatial information of various scales. The experimental results show that the transformer with the proposed modules converges faster and outperforms various vision transformers on image classification (ImageNet-1K) and object detection (MS-COCO).
Date of Conference: 29 November 2022 - 02 December 2022
Date Added to IEEE Xplore: 24 November 2022
ISBN Information:
Conference Location: Madrid, Spain

References

References is not available for this document.