Loading [MathJax]/extensions/MathMenu.js
Multi-Dimensional Attentive Hierarchical Graph Pooling Network for Video-Text Retrieval | IEEE Conference Publication | IEEE Xplore

Multi-Dimensional Attentive Hierarchical Graph Pooling Network for Video-Text Retrieval


Abstract:

Video-text retrieval task has raised increasing attention due to the rapid growth of videos on the Internet. Existing works adopt various networks to encode videos and te...Show More

Abstract:

Video-text retrieval task has raised increasing attention due to the rapid growth of videos on the Internet. Existing works adopt various networks to encode videos and texts into a common latent space and calculate their similarities. However, most works ignore mining significant frames of videos and the difference among different dimensions in word representations, leading to unsatisfactory retrieval results. In this paper, we propose a Multi-Dimensional Attentive Hierarchical Graph Pooling Network (MAGP) to learn improved representations for video-text retrieval. Specifically, we design a novel hierarchical graph pooling method to extract significant frames in videos and discard unrelated frames, hence the model can learn hierarchical and discriminative video representations. Moreover, a multi-dimensional attention mechanism is utilized in text encoder to strengthen representation ability by dimension-level attention. Experimental results on three video-text datasets demonstrate our MAGP model out-performs the state-of-the-art models.
Date of Conference: 05-09 July 2021
Date Added to IEEE Xplore: 09 June 2021
ISBN Information:

ISSN Information:

Conference Location: Shenzhen, China

References

References is not available for this document.