Transformer Memory for Interactive Visual Navigation in Cluttered Environments | IEEE Journals & Magazine | IEEE Xplore

Transformer Memory for Interactive Visual Navigation in Cluttered Environments


Abstract:

Substantial progress has been achieved in embodied visual navigation based on reinforcement learning (RL). These studies presume that the environment is stationary where ...Show More

Abstract:

Substantial progress has been achieved in embodied visual navigation based on reinforcement learning (RL). These studies presume that the environment is stationary where all the obstacles are static. However, in real cluttered scenes, interactable objects (e.g. shoes and boxes) blocking the way of robots makes the environment non-stationary. Accordingly, the ego-centric visual agent will easily get stuck in the dilemma of finding the next waypoint as it struggles to decide whether to push the obstacles ahead. To handle the predicament, we formulate this interactive visual navigation as Partial Observed Markov Decision Process (POMDP). As the transformer encoder has demonstrated its superior ability to capture the spatial-temporal dependencies in natural language processing. We propose a transformer-based memory to empower the agents utilizing the historical interactive information. However, directly leveraging the transformer architecture in the RL settings is highly unstable. We further propose a surrogate objective to predict the next waypoint as the auxiliary task, which facilitates the representation learning and bootstraps the RL. We demonstrate our method in the iGibson environment and experimental results show a significant improvement over the interactive Gibson benchmark and the related recurrent RL policy both in the validation seen scenes and the test unseen scenes.
Published in: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 3, March 2023)
Page(s): 1731 - 1738
Date of Publication: 02 February 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.