Loading [a11y]/accessibility-menu.js
Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization | IEEE Journals & Magazine | IEEE Xplore

Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization


Abstract:

Video moment localization, as an important branch of video content analysis, has attracted extensive attention in recent years. However, it is still in its infancy due to...Show More

Abstract:

Video moment localization, as an important branch of video content analysis, has attracted extensive attention in recent years. However, it is still in its infancy due to the following challenges: cross-modal semantic alignment and localization efficiency. To address these impediments, we present a cross-modal semantic alignment network. To be specific, we first design a video encoder to generate moment candidates, learn their representations, as well as model their semantic relevance. Meanwhile, we design a query encoder for diverse query intention understanding. Thereafter, we introduce a multi-granularity interaction module to deeply explore the semantic correlation between multi-modalities. Thereby, we can effectively complete target moment localization via sufficient cross-modal semantic understanding. Moreover, we introduce a semantic pruning strategy to reduce cross-modal retrieval overhead, improving localization efficiency. Experimental results on two benchmark datasets have justified the superiority of our model over several state-of-the-art competitors.
Published in: IEEE Transactions on Image Processing ( Volume: 30)
Page(s): 5933 - 5943
Date of Publication: 24 June 2021

ISSN Information:

PubMed ID: 34166192

Funding Agency:


References

References is not available for this document.