Conferences >2023 8th International Confer...

HPViT: A Hybrid Visual Model with Feature Pyramid Transformer Structure

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Recently,the fusion design of Transformer and CNN has significantly improved the efficiency and accuracy of the model. In this work, we propose a hybrid backbone network ...Show More

Metadata

Abstract:

Recently,the fusion design of Transformer and CNN has significantly improved the efficiency and accuracy of the model. In this work, we propose a hybrid backbone network model –Hybrid Pyramid Vision Transformer(HPViT), which can be used for dense prediction tasks. Compared with the ViT image classification design, HPViT introduces the Transformer structure into CNN and also adopts a pyramid structure, which allows various dense prediction tasks, detection and segmentation tasks, etc. Compared with ViT, HPViT has the following advantages: (1) Compared with the high computational complexity and high memory usage of ViT, HPViT can not only train high-resolution images for density division to capture enough detail information, but also converge faster, occupy less memory, and reduce the computation brought by the Transformer structure through the pyramid structure; (2) HPViT has the advantages of CNNs and Transformer and can be used as a general backbone. (3) Experiments show that HPViT performs well in image classification and object detection, with a top1 accuracy rate of 81.2% on the ImageNet1k dataset. In the task of object detection, RetinaNet+HPViT finetuned on COCO for 12 rounds reached 34.3%AP, while RetinaNet+ResNet50 only had 22.9%AP.

Published in: 2023 8th International Conference on Control, Robotics and Cybernetics (CRC)

Date of Conference: 22-24 December 2023

Date Added to IEEE Xplore: 09 April 2024

ISBN Information:

DOI: 10.1109/CRC60659.2023.10488581

Conference Location: Changsha, China

Contents

References is not available for this document.

HPViT: A Hybrid Visual Model with Feature Pyramid Transformer Structure

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

HPViT: A Hybrid Visual Model with Feature Pyramid Transformer Structure

Alerts

Abstract:

Metadata

Abstract:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?