Loading [a11y]/accessibility-menu.js
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 27 January, the IEEE Xplore Author Profile management portal will undergo scheduled maintenance from 9:00-11:00 AM ET (1400-1600 UTC). During this time, access to the portal will be unavailable. We apologize for any inconvenience.

Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions


Abstract:

This paper proposes a new pure attention model, Aggregated Pyramid Vision Transformer (APVT), for computer vision applications. Based on the Vision Transformer (ViT) arch...Show More

Abstract:

This paper proposes a new pure attention model, Aggregated Pyramid Vision Transformer (APVT), for computer vision applications. Based on the Vision Transformer (ViT) architecture, APVT adopts the classic pyramid architecture of CNN and employs the group encoder technique to replace the traditional encoder for feature enhancement. APVT uses the split-transform-merge strategy to refine the group encoder operation. The model performs image classification on CIFAR-10 dataset and object detection on COCO 2017 dataset for verification. Experimental results show that APVT has excellent performance compared to other Transformer network architectures.
Date of Conference: 06-08 July 2022
Date Added to IEEE Xplore: 01 September 2022
ISBN Information:

ISSN Information:

Conference Location: Taipei, Taiwan

References

References is not available for this document.