Loading [a11y]/accessibility-menu.js
Referring Image Segmentation via Text Guided Multi-Level Interaction | IEEE Conference Publication | IEEE Xplore

Referring Image Segmentation via Text Guided Multi-Level Interaction


Abstract:

We focus on the problem of segmenting entities that can match given natural language referring expressions. Existing works tackle this problem by fusing visual and lingui...Show More

Abstract:

We focus on the problem of segmenting entities that can match given natural language referring expressions. Existing works tackle this problem by fusing visual and linguistic modalities implicitly, failing to well align features from informative words in expressions and object regions in images. This makes it difficult to accurately infer referred entities, especially for scenes with complex object dependencies or interactions. In this paper, we propose a Text Guided Multi-level Interaction (TGMI) method to effectively address this challenging task. Specifically, the proposed TGMI referring segmentation method imitates the way humans process complex referring dependencies, from local target localization to global relation lookup. The proposed TGMI method has the following advantages: a) It highlights image areas described by language through gradually fusing the features of each word and the given image. b) It enables accurate relation matching between entire sentences and images. c) It ensures correct information fusion at both local and global levels. Experimental results on three standard datasets show significant improvements over all compared baseline models, demonstrating the effectiveness of our method.
Date of Conference: 10-12 August 2023
Date Added to IEEE Xplore: 05 September 2023
ISBN Information:
Print on Demand(PoD) ISSN: 2377-8644
Conference Location: Dalian, China

Funding Agency:


References

References is not available for this document.