Abstract:
Multimodal problems such as caption generation advances AI as a whole since they require integration of several key domains such as computer vision, NLP and knowledge rep...Show MoreMetadata
Abstract:
Multimodal problems such as caption generation advances AI as a whole since they require integration of several key domains such as computer vision, NLP and knowledge representation. In this paper, we develop a new approach to evaluate captioning models by verifying them using Markov Logic Networks (MLNs). Specifically, we compile an MLN from training data and perform probabilistic inference to estimate uncertainty in a generated caption. To reify the caption, we leverage advances in Natural Language Inference (NLI) models and convert a caption into a query for the MLN. Further, we add visual context into the MLN distribution using an attention-based Multiple Instance Learning model and evaluate a caption based on this augmented distribution. We perform experiments using MSCOCO on several state-of-the-art benchmarks and show that our approach can evaluate captioning models just as effectively as methods that require human-generated captions.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information: