Abstract:
This paper proposes a new pure attention model, Aggregated Pyramid Vision Transformer (APVT), for computer vision applications. Based on the Vision Transformer (ViT) arch...Show MoreMetadata
Abstract:
This paper proposes a new pure attention model, Aggregated Pyramid Vision Transformer (APVT), for computer vision applications. Based on the Vision Transformer (ViT) architecture, APVT adopts the classic pyramid architecture of CNN and employs the group encoder technique to replace the traditional encoder for feature enhancement. APVT uses the split-transform-merge strategy to refine the group encoder operation. The model performs image classification on CIFAR-10 dataset and object detection on COCO 2017 dataset for verification. Experimental results show that APVT has excellent performance compared to other Transformer network architectures.
Date of Conference: 06-08 July 2022
Date Added to IEEE Xplore: 01 September 2022
ISBN Information: