Abstract:
Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods pr...Show MoreMetadata
Abstract:
Semantic segmentation is a crucial task with wide-ranging applications, including autonomous driving and robot navigation. However, prevailing state-of-the-art methods primarily focus on monocular images, neglecting the untapped potential of stereo cameras commonly equipped in autonomous vehicles and robots, which capture binocular images. In this article, we aim to introduce an innovative stereo-vision-based semantic segmentation framework that maximizes the utilization of stereo image data to enhance segmentation performance. Unlike conventional monocular approaches that only use one image, our method effectively uses both the images, exploiting interimage correspondences and harnessing previously neglected information. Our core innovations encompass label generation for right images, combined with stereo-vision-based information fusion. For label generation, we propose a novel technique to accurately generate labels for the right images in stereo pairs, even in scenarios with no direct annotations. This innovative approach empowers our models to effectively learn from a complete stereo dataset, enhancing their semantic segmentation capabilities. In addition, our stereo-vision-based information fusion framework seamlessly integrates features and spatial disparities from the binocular images, enabling our models to produce more accurate and contextually enriched semantic segmentation outputs. To validate the efficacy of our proposed approach, we conduct comprehensive experiments on the Cityscapes and KITTI datasets using diverse, well-known semantic segmentation architectures. The results unequivocally demonstrate the superiority and effectiveness of our method.
Published in: IEEE Transactions on Instrumentation and Measurement ( Volume: 72)