Loading [a11y]/accessibility-menu.js
Self-Supervised Multi-Frame Monocular Depth Estimation for Dynamic Scenes | IEEE Journals & Magazine | IEEE Xplore

Self-Supervised Multi-Frame Monocular Depth Estimation for Dynamic Scenes


Abstract:

Self-supervised multi-frame depth estimation outperforms single-frame approaches by utilizing not only appearance information, but also geometric information. A common pr...Show More

Abstract:

Self-supervised multi-frame depth estimation outperforms single-frame approaches by utilizing not only appearance information, but also geometric information. A common practice for multi-frame methods is to employ feature-metric bundle adjustment (FBA) to refine depth map initialized from the single-frame prior. However, FBA cannot always provide effective residual updates due to unreliable matching costs, which are corrupted by thin texture, occlusion, and especially object motion. To tackle this problem, we propose a context-aware transformer (CAT) to refine the corrupted matching costs by leveraging the spatial context information. Specifically, the CAT adaptively aggregates matching costs according to the spatial affinity inferred from local appearance context, and produces reliable contextual costs for FBA. Moreover, we design a motion-aware regularization loss to provide supervision for regions with moving objects, making CAT competent for dynamic scenes. Extensive experiments and analyses on the KITTI and Cityscapes datasets demonstrate the effectiveness and superior generalization capability of our approach.
Page(s): 4989 - 5001
Date of Publication: 07 December 2023

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.