Journals & Magazines >IEEE MultiMedia >Volume: 31 Issue: 4

The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the e...Show More

Metadata

Abstract:

Scene recognition is a challenging task in computer vision because of the diversity of objects in scene images and the ambiguity of object layouts. In recent years, the emergence of multimodal scene data has provided new solutions for scene recognition, but it has also brought new problems. To address these challenges, the self-attention and distillation-based multimodal scene recognition network (SAD-MSR) is proposed in this article. The backbone of the model adopts the pure transformer structure of self-attention, which can extract local and global spatial features of multimodal scene images. A multistage fusion mechanism was developed for this model in which the concatenated tokens of two modalities are fused based on self-attention in the early stage, while the high-level features extracted from the two modalities are fused based on cross attention in the late stage. Furthermore, a distillation mechanism is introduced to alleviate the problem of a limited number of training samples. Finally, we conducted extensive experiments on two multimodal scene recognition databases, SUN RGB-D and NYU Depth, to show the effectiveness of SAD-MSR. Compared with other state-of-the-art multimodal scene recognition methods, our method can achieve better experimental results.

Published in: IEEE MultiMedia ( Volume: 31, Issue: 4, Oct.-Dec. 2024)

Page(s): 25 - 36

Date of Publication: 25 June 2024

ISSN Information:

DOI: 10.1109/MMUL.2024.3415643

Funding Agency:

Contents

References is not available for this document.

The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

The Multimodal Scene Recognition Method Based on Self-Attention and Distillation

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?