Abstract:
Most image caption methods directly learn the mapping relationship from image to text. In practice, however, paying attention to both sentence structure and visual conten...Show MoreMetadata
Abstract:
Most image caption methods directly learn the mapping relationship from image to text. In practice, however, paying attention to both sentence structure and visual content at the same time can be difficult. In this paper, we propose a model, called Re-correct Net, which aims to use the existing caption information by other captioners, to guide the visual content in the generation of new caption. In addition, to obtain the more accurate caption, our method uses the existing textured entity as additional prior knowledge. Experiments show that our model can be used as re-correct block after all captioner training, which is beneficial to improve the quality of caption and is also flexible.
Published in: 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC)
Date of Conference: 17-19 November 2021
Date Added to IEEE Xplore: 04 January 2022
ISBN Information: