Loading [a11y]/accessibility-menu.js
To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training | IEEE Journals & Magazine | IEEE Xplore

To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training


Abstract:

Recently, there exists an increased research interest in embodied artificial intelligence (EAI), which involves an agent learning to perform a specific task when dynamica...Show More

Abstract:

Recently, there exists an increased research interest in embodied artificial intelligence (EAI), which involves an agent learning to perform a specific task when dynamically interacting with the surrounding 3D environment. There into, a new challenge is that many unseen objects may appear due to the increased number of object categories in 3D scenes. It makes developing models with strong zero-shot generalization ability to new objects necessary. Existing work tries to achieve this goal by providing embodied agents with massive high-quality human annotations closely related to the task to be learned, while it is too costly in practice. Inspired by recent advances in pre-trained models in 2D visual tasks, we attempt to boost zero-shot generalization for embodied reasoning with vision-language pre-training that can encode common sense as general prior knowledge. To further improve its performance on a specific task, we rectify the pre-trained representation through masked scene graph modeling (MSGM) in a self-supervised manner, where the task-specific knowledge is learned from iterative message passing. Our method can improve a variety of representative embodied reasoning tasks by a large margin (e.g., over 5.0% w.r.t. answer accuracy on MP3D-EQA dataset that consists of many real-world scenes with a large number of new objects during testing), and achieve the new state-of-the-art performance.
Published in: IEEE Transactions on Image Processing ( Volume: 33)
Page(s): 5370 - 5381
Date of Publication: 18 September 2024

ISSN Information:

PubMed ID: 39292596

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.