Loading [MathJax]/extensions/TeX/ietmacros.js
Rich Features Embedding for Cross-Modal Retrieval: A Simple Baseline | IEEE Journals & Magazine | IEEE Xplore

Rich Features Embedding for Cross-Modal Retrieval: A Simple Baseline


Abstract:

During the past few years, significant progress has been made on cross-modal retrieval, benefiting from the development of deep neural networks. Meanwhile, the overall fr...Show More

Abstract:

During the past few years, significant progress has been made on cross-modal retrieval, benefiting from the development of deep neural networks. Meanwhile, the overall frameworks are becoming more and more complex, making the training as well as the analysis more difficult. In this paper, we provide a Rich Features Embedding (RFE) approach to tackle the cross-modal retrieval tasks in a simple yet effective way. RFE proposes to construct rich representations for both images and texts, which is further leveraged to learn the rich features embedding in the common space according to a simple hard triplet loss. Without any bells and whistles in constructing complex components, the proposed RFE is concise and easy to implement. More importantly, our RFE obtains the state-of-the-art results on several popular benchmarks such as MS COCO and Flickr 30 K. In particular, the image-to-text and text-to-image retrieval achieve 76.1% and 61.1% (R@1) on MS COCO, which outperform others more than 3.4% and 2.3%, respectively. We hope our RFE will serve as a solid baseline and help ease future research in cross-modal retrieval.
Published in: IEEE Transactions on Multimedia ( Volume: 22, Issue: 9, September 2020)
Page(s): 2354 - 2365
Date of Publication: 12 December 2019

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.