Journals & Magazines >IEEE Transactions on Multimedia >Volume: 25

Inter-Intra Modal Representation Augmentation With DCT-Transformer Adversarial Network for Image-Text Matching

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Image-text matching has become a challenging task in the multimedia analysis field. Many advanced methods have been used to explore local and global cross-modal correspon...Show More

Metadata

Abstract:

Image-text matching has become a challenging task in the multimedia analysis field. Many advanced methods have been used to explore local and global cross-modal correspondence in matching. However, most methods ignore the importance of eliminating potential irrelevant features in the original features of each modality and cross-modal common feature. Moreover, the features extracted from regions in images and words in sentences contain cluttered background noise and different occlusion noise, which negatively affects alignment. Different from these methods, we propose a novel DCT-Transformer Adversarial Network (DTAN) for image-text matching in this paper. This work can obtain an effective metric based on two aspects: i) DCT-Transformer uses DCT (Discrete Cosine Transform) method based on a transformer mechanism to extract multi-domain common representations and eliminate irrelevant features from different modalities (inter-modal). Among them, DCT divides multi-modal content into chunks of different frequencies and quantifies them. ii) The adversarial network introduces an adversary idea by combining the original features of various single modalities and the multi-domain common representation, alleviating the background noise within each modality (intra-modal). The proposed adversarial feature augmentation method can easily obtain the common representation that is only useful for alignment. Extensive experiments are completed on the benchmark datasets Flickr30 K and MS-COCO, demonstrating the superiority of the DTAN model over the state-of-the-art methods.

Published in: IEEE Transactions on Multimedia ( Volume: 25)

Page(s): 8933 - 8945

Date of Publication: 09 February 2023

ISSN Information:

DOI: 10.1109/TMM.2023.3243665

Funding Agency:

Contents

References is not available for this document.

Inter-Intra Modal Representation Augmentation With DCT-Transformer Adversarial Network for Image-Text Matching

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Inter-Intra Modal Representation Augmentation With DCT-Transformer Adversarial Network for Image-Text Matching

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?