research-article

Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation

Authors:

Jian-Jun Qiao,

Zhi-Qi Cheng,

Xiao Wu,

Wei Li,

Ji ZhangAuthors Info & Claims

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6300 - 6308

https://doi.org/10.1145/3503161.3547786

Published: 10 October 2022 Publication History

Get Access

Abstract

Real-time semantic segmentation is essential for many practical applications, which utilizes attention-based feature aggregation into lightweight structures to improve accuracy and efficiency. However, existing attention-based methods ignore 1) high-level and low-level feature augmentation guided by spatial information, and 2) low-level feature augmentation guided by semantic context, so that feature gaps between multi-level features and noise of low-level spatial details still exist. To address these problems, a new real-time semantic segmentation network, called MvFSeg, is proposed. In MvFSeg, parallel convolution with multiple depths is designed as a context head to generate and integrate multi-view features with larger receptive fields. Moreover, MvFSeg designs multiple views feature augmentation strategies that exploit spatial and semantic guidance for shallow and deep feature augmentation in an inter-layer and intra-layer manner. These strategies eliminate feature gaps between multi-level features, filter out the noise of spatial details, and provide spatial and semantic guidance for multi-level features. By combining multi-view features and augmented features from the lightweight networks with progressive dense aggregation structures, MvFSeg effectively captures invariance at various scales and generates high-quality segmentation results. Experiments conducted on Cityscapes and CamVid benchmark show that MvFSeg outperforms existing state-of-the-art methods.

Supplementary Material

MP4 File (MM22-fp0222.mp4)

Traditional real-time segmentation methods have difficulties in segmentation of complex scenes, facing the challenges of weak feature representation ability, insufficient perceptual regions, limited feature views and feature gaps among multi-level features. To alleviate these problems, this paper proposes a new real-time segmentation network, MvFSeg, for multi-view feature augmentation and aggregation. In MvFSeg, light-weight backbone network is adopted to generate 4-stage hierarchical shallow and deep features. Parallel multiple depths convolution is proposed to generate higher-level features with multi-view receptive fields. Multi-level feature augmentation is designed to provide spatial information and semantic context to features at all levels, eliminating spatial gaps and semantic gaps among different features. Progressive dense feature aggregation is devised to fuse original multi-level features and the augmented features, increasing feature views and enhancing feature representation ability.

Download
29.00 MB

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2015), 2481--2495.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation: ZMNet: feature fusion and semantic boundary supervision for real-time semantic segmentation

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Hierarchical feature aggregation network with semantic attention for counting large‐scale crowd

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations