Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation

Published: 10 October 2022


Real-time semantic segmentation is essential for many practical applications, which utilizes attention-based feature aggregation into lightweight structures to improve accuracy and efficiency. However, existing attention-based methods ignore 1) high-level and low-level feature augmentation guided by spatial information, and 2) low-level feature augmentation guided by semantic context, so that feature gaps between multi-level features and noise of low-level spatial details still exist. To address these problems, a new real-time semantic segmentation network, called MvFSeg, is proposed. In MvFSeg, parallel convolution with multiple depths is designed as a context head to generate and integrate multi-view features with larger receptive fields. Moreover, MvFSeg designs multiple views feature augmentation strategies that exploit spatial and semantic guidance for shallow and deep feature augmentation in an inter-layer and intra-layer manner. These strategies eliminate feature gaps between multi-level features, filter out the noise of spatial details, and provide spatial and semantic guidance for multi-level features. By combining multi-view features and augmented features from the lightweight networks with progressive dense aggregation structures, MvFSeg effectively captures invariance at various scales and generates high-quality segmentation results. Experiments conducted on Cityscapes and CamVid benchmark show that MvFSeg outperforms existing state-of-the-art methods.

Traditional real-time segmentation methods have difficulties in segmentation of complex scenes, facing the challenges of weak feature representation ability, insufficient perceptual regions, limited feature views and feature gaps among multi-level features. To alleviate these problems, this paper proposes a new real-time segmentation network, MvFSeg, for multi-view feature augmentation and aggregation. In MvFSeg, light-weight backbone network is adopted to generate 4-stage hierarchical shallow and deep features. Parallel multiple depths convolution is proposed to generate higher-level features with multi-view receptive fields. Multi-level feature augmentation is designed to provide spatial information and semantic context to features at all levels, eliminating spatial gaps and semantic gaps among different features. Progressive dense feature aggregation is devised to fuse original multi-level features and the augmented features, increasing feature views and enhancing feature representation ability.


  (2024)Deep Segmentation Techniques for Breast Cancer DiagnosisBioMedInformatics10.3390/biomedinformatics40200524:2(921-945)Online publication date: 1-Apr-2024
  (2024)Research progress and challenges in real-time semantic segmentation for deep learningJournal of Image and Graphics10.11834/jig.23060529:5(1188-1220)Online publication date: 2024
  (2023)Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611956(163-172)Online publication date: 26-Oct-2023
    Author Tags

    1. attention mechanism
    2. feature aggregation
    3. feature augmentation
    4. multi-view features
    5. real-time semantic segmentation


