Abstract:
Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB feat...Show MoreMetadata
Abstract:
Incorporating depth information into RGB images has proven its effectiveness in semantic segmentation. The multi-modal feature fusion, which integrates depth and RGB features, is a crucial component determining segmentation accuracy. Most existing multi-modal feature fusion schemes enhance multi-modal features via channel-wise attention modules which leverage global context information. In this work, we propose a novel pyramid-context guided fusion (PCGF) module to fully exploit the complementary information from the depth and RGB features. The proposed PCGF utilizes both local and global contexts inside the attention module to provide effective guidance for fusing cross-modal features of inconsistent semantics. Moreover, we introduce a lightweight yet practical multi-level general fusion module to combine the features at multiple levels of abstraction to enable high-resolution prediction. Utilizing the proposed feature fusion modules, our Pyramid-Context Guided Network (PCGNet) can learn discriminative features by taking full advantage of multi-modal and multi-level information. Our comprehensive experiments demonstrate that the proposed PCGNet achieves state-of-the-art performance on two benchmark datasets NYUDv2 and SUN-RGBD.
Date of Conference: 18-22 July 2022
Date Added to IEEE Xplore: 23 August 2022
ISBN Information: