Loading [MathJax]/extensions/TeX/ieee_stixext.js
Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval | IEEE Journals & Magazine | IEEE Xplore

Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval


Abstract:

As an important field in information retrieval, fine-grained cross-modal retrieval has received great attentions from researchers. Existing fine-grained cross-modal retri...Show More

Abstract:

As an important field in information retrieval, fine-grained cross-modal retrieval has received great attentions from researchers. Existing fine-grained cross-modal retrieval methods made several improvements in capturing the fine-grained interplay between vision and language, failing to consider the fine-grained correspondences between the features in the image latent space and the text latent space respectively, which may lead to inaccurate inference of intra-modal relations or false alignment of cross-modal information. Considering that object detection can get the fine-grained correspondences of image region features and the corresponding semantic features, this paper proposed a novel latent space semantic supervision model based on knowledge distillation (L3S-KD), which trains classifiers supervised by the fine-grained correspondences obtained from an object detection model by using knowledge distillation for image latent space fine-grained alignment, and by the labels of objects and attributes for text latent space fine-grained alignment. Compared with existing fine-grained correspondence matching methods, L3S-KD can learn more accurate semantic similarities for local fragments in image-text pairs. Extensive experiments on MS-COCO and Flickr30K datasets demonstrate that the L3S-KD model consistently outperforms state-of-the-art methods for image-text matching.
Published in: IEEE Transactions on Image Processing ( Volume: 31)
Page(s): 7154 - 7164
Date of Publication: 10 November 2022

ISSN Information:

PubMed ID: 36355734

Funding Agency:


References

References is not available for this document.