Abstract:
With the advent of virtual reality and augment reality applications, omnidirectional imaging and 360^{\circ } cameras become increasingly popular in many scenarios such...Show MoreMetadata
Abstract:
With the advent of virtual reality and augment reality applications, omnidirectional imaging and 360^{\circ } cameras become increasingly popular in many scenarios such as entertainment and autonomous systems. In this paper, we propose a self-supervised framework for multi-task learning on depth, camera motion and semantics from panoramic videos. Specifically, our method is based on differentiable warping of adjacent views to the target. Two improvements are provided. First, we introduce a view synthesis module based on equirectangular projection to enable direct optimization on panoramic images. Second, we introduce a self-supervised segmentation branch to involve the constraint of semantic consistency for further improvement. Extensive experiments on two 360^{\circ } video and two 360^{\circ } image datasets demonstrate that our method outperforms the state-of-the-art and achieves favorable cross-modality performance.
Published in: IEEE Signal Processing Letters ( Volume: 28)