Can Visual Linguistic Models become Knowledge Bases: A Prompt Learning Framework for Size Perception | IEEE Conference Publication | IEEE Xplore

Can Visual Linguistic Models become Knowledge Bases: A Prompt Learning Framework for Size Perception


Abstract:

Empowering machines to understand our physical world should go beyond models with only natural language and models with only vision. Vision and language comprise a growin...Show More

Abstract:

Empowering machines to understand our physical world should go beyond models with only natural language and models with only vision. Vision and language comprise a growing field of study that attempts to bridge the gap between natural language processing and computer vision communities by enabling models to learn visually grounded language. However, as an increasing number of pre-trained visual linguistic models focus on the alignment between visual regions and natural language, it is difficult to claim that these models capture certain properties of objects in their latent space, such as size. Inspired by recent trends in prompt learning, this study designed a prompt learning framework for two visual linguistic models, ViLBERT and ViLT, and used manually crafted prompt templates to evaluate the consistency of performance of these models in comparing the size of objects.
Date of Conference: 08-10 December 2022
Date Added to IEEE Xplore: 21 April 2023
ISBN Information:
Conference Location: Toronto, ON, Canada

Contact IEEE to Subscribe

References

References is not available for this document.