Abstract:
While significant progress has been made in recognizing entities from plain text, the exploration of entity recognition from multimodal data remains limited due to dispar...Show MoreMetadata
Abstract:
While significant progress has been made in recognizing entities from plain text, the exploration of entity recognition from multimodal data remains limited due to disparities in semantic representation. In light of this challenge, given the supportive nature of visual and text data, we propose a novel entity recognition model called Heterogeneous Graph Reasoning(HGR), leveraging the synergistic nature of visual and textual data. This is achieved through the utilization of the Vision Refine and Graph Cross Inference modules. In the Vision Refine module, semantically relevant objects hidden in the image are selected to aid in the text entity extraction. In the Graph Cross Inference module, cross-association inference between visual regions and textual entities is constructed through graph construction, heterogeneous graph fusion, visual region refinement and cross inference. Extensive experiments on four multimodal datasets are demonstrate the superiority of our model, when compared to the second-best state-of-the-art model.
Date of Conference: 09-13 October 2023
Date Added to IEEE Xplore: 06 November 2023
ISBN Information: