Journals & Magazines >IEEE Signal Processing Letters >Volume: 31

Hybrid ViT-CNN Network for Fine-Grained Image Classification

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that ...Show More

Metadata

Abstract:

In recent years, vision transformer (ViT) has achieved remarkable breakthroughs in fine-grained visual classification (FGVC) because of its self-attention mechanism that excels in extracting distinctive features from different pixels. However, pure ViT falls short in capturing the crucial multi-scale, local, and low-layer features that hold significance for FGVC. To compensate for these shortcomings, a new hybrid network called HVCNet is designed, which fuses the advantages of ViT and convolutional neural networks (CNN). The three modifications in the original ViT are: 1) using a multi-scale image-to-tokens (MIT) module instead of directly tokenizing the raw input image, thus enabling the network to capture the features at different scales; 2) substituting feed-forward network in ViT's encoder with mixed convolution feed-forward (MCF) module, which enhances the capability of the network in capturing the local and multi-scale features; 3) designing multi-layer feature selection (MFS) module to address the issue of deep-layer tokens in ViT to avoid ignoring the local and low-layer features. The experiment results indicate that the proposed method surpasses state-of-the-art methods on publicly datasets.

Published in: IEEE Signal Processing Letters ( Volume: 31)

Page(s): 1109 - 1113

Date of Publication: 08 April 2024

ISSN Information:

DOI: 10.1109/LSP.2024.3386112

Funding Agency:

Contents

References is not available for this document.

Hybrid ViT-CNN Network for Fine-Grained Image Classification

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Hybrid ViT-CNN Network for Fine-Grained Image Classification

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?