skip to main content
10.1145/3503161.3547786acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation

Published: 10 October 2022 Publication History

Abstract

Real-time semantic segmentation is essential for many practical applications, which utilizes attention-based feature aggregation into lightweight structures to improve accuracy and efficiency. However, existing attention-based methods ignore 1) high-level and low-level feature augmentation guided by spatial information, and 2) low-level feature augmentation guided by semantic context, so that feature gaps between multi-level features and noise of low-level spatial details still exist. To address these problems, a new real-time semantic segmentation network, called MvFSeg, is proposed. In MvFSeg, parallel convolution with multiple depths is designed as a context head to generate and integrate multi-view features with larger receptive fields. Moreover, MvFSeg designs multiple views feature augmentation strategies that exploit spatial and semantic guidance for shallow and deep feature augmentation in an inter-layer and intra-layer manner. These strategies eliminate feature gaps between multi-level features, filter out the noise of spatial details, and provide spatial and semantic guidance for multi-level features. By combining multi-view features and augmented features from the lightweight networks with progressive dense aggregation structures, MvFSeg effectively captures invariance at various scales and generates high-quality segmentation results. Experiments conducted on Cityscapes and CamVid benchmark show that MvFSeg outperforms existing state-of-the-art methods.

Supplementary Material

MP4 File (MM22-fp0222.mp4)
Traditional real-time segmentation methods have difficulties in segmentation of complex scenes, facing the challenges of weak feature representation ability, insufficient perceptual regions, limited feature views and feature gaps among multi-level features. To alleviate these problems, this paper proposes a new real-time segmentation network, MvFSeg, for multi-view feature augmentation and aggregation. In MvFSeg, light-weight backbone network is adopted to generate 4-stage hierarchical shallow and deep features. Parallel multiple depths convolution is proposed to generate higher-level features with multi-view receptive fields. Multi-level feature augmentation is designed to provide spatial information and semantic context to features at all levels, eliminating spatial gaps and semantic gaps among different features. Progressive dense feature aggregation is devised to fuse original multi-level features and the augmented features, increasing feature views and enhancing feature representation ability.

References

[1]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2015. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2015), 2481--2495.
[2]
Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, and Roberto Cipolla. 2008. Segmentation and Recognition Using Structure from Motion Point Clouds. In Proc. European Conference on Computer Vision. 44--57.
[3]
Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. 2019. HarDNet: A Low Memory Traffic Network. In Proc. IEEE International Conference on Computer Vision. 3551--3560.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 834--848.
[5]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proc. European Conference on Computer Vision. 833--851.
[6]
Zhi-Qi Cheng, Yang Liu, XiaoWu, and Xian-Sheng Hua. 2016. Video Ecommerce: Towards Online Video Advertising. In Proc. ACM international conference on Multimedia. 1365--1374.
[7]
Zhi-Qi Cheng, Xiao Wu, Yang Liu, and Xian-Sheng Hua. 2017. Video ecommerce: Toward large scale online video advertising. IEEE transactions on multimedia 19, 6 (2017), 1170--1183.
[8]
Zhi-Qi Cheng, XiaoWu, Yang Liu, and Xian-Sheng Hua. 2017. Video2shop: Exact Matching Clothes in Videos to Online Shopping Images. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 4048--4056.
[9]
Zhi-Qi Cheng, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017. On the selection of anchors and targets for video hyperlinking. In Proc. ACM International Conference on Multimedia Retrieval. 287--293.
[10]
Wenqing Chu, Yao Liu, Chen Shen, Deng Cai, and Xian-Sheng Hua. 2018. Multi- Task Vehicle Detection With Region-of-Interest Voting. IEEE Transactions on Image Processing 27, 1 (2018), 432--441.
[11]
Marius Cordts, Mohamed Omran, Sebastian Ramos, and et al. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 3213--3223.
[12]
Mingyuan Fan, Shenqi Lai, Junshi Huang, XiaomingWei, Zhenhua Chai, Junfeng Luo, and Xiaolin Wei. 2021. Rethinking BiSeNet For Real-time Semantic Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 9716--9725.
[13]
Jun Fu, Jing Liu, Haijie Tian, Zhiwei Fang, and Hanqing Lu. 2019. Dual Attention Network for Scene Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 3146--3154.
[14]
Zhangxuan Gu, Li Niu, Haohua Zhao, and Liqing Zhang. 2020. Hard Pixel Mining for Depth Privileged Semantic Segmentation. IEEE Transactions on Multimedia 23 (2020), 3738--3751.
[15]
Kai Han, YunheWang, Qi Tian, Jianyuan Guo, Chunjing Xu, and Chang Xu. 2020. GhostNet: More Features From Cheap Operations. In Proc. European Conference on Computer Vision. 1577--1586.
[16]
Yanbin Hao, Hao Zhang, Chong-Wah Ngo, and Xiangnan He. 2022. Group Contextualization for Video Recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 928--938.
[17]
Jun-Yan He, Shi-Hua Liang, Xiao Wu, Bo Zhao, and Lei Zhang. 2021. MGSeg: Multiple Granularity Based Real-time Semantic Segmentation Network. IEEE Transactions on Image Processing 30 (2021), 7200--7214.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.
[20]
Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, and Wenyu Liu. 2019. CCNet: Criss-Cross Attention for Semantic Segmentation. In Proc. IEEE International Conference on Computer Vision. 603--612.
[21]
Byeongkeun Kang, Yeejin Lee, and Truong Q. Nguyen. 2018. Depth-Adaptive Deep Neural Network for Semantic Segmentation. IEEE Transactions on Multimedia 20, 9 (2018), 2478--2490.
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. Advances in Neural Information Processing Systems. 1106--1114.
[23]
Hanchao Li, Pengfei Xiong, Haoqiang Fan, and Jian Sun. 2019. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 9522--9531.
[24]
Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Shaohua Tan, and Yunhai Tong. 2020. Semantic Flow for Fast and Accurate Scene Parsing. In Proc. European Conference on Computer Vision. 775--793.
[25]
Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan Yuille, and Fei-Fei Li. 2019. Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 82--92.
[26]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.
[27]
Zhichao Lu, Kalyanmoy Deb, and Vishnu Naresh Boddeti. 2020. MUXConv: Information Multiplexing in Convolutional Neural Networks. In Proc. European Conference on Computer Vision. 12041--12050.
[28]
Sachin Mehta, Mohammad Rastegari, Anat Caspi, Shapiro, Linda G., and Hannaneh Hajishirzi. 2018. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. In Proc. European Conference on Computer Vision. 561--580.
[29]
David Nilsson and Cristian Sminchisescu. 2018. Semantic Video Segmentation by Gated Recurrent Flow Propagation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 6819--6828.
[30]
Yuval Nirkin, Lior Wolf, and Tal Hassner. 2021. HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 4061--4070.
[31]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning Deconvolution Network for Semantic Segmentation. In Proc. IEEE International Conference on Computer Vision. 1520--1528.
[32]
Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: ADeep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv preprint arXiv:1606.02147 (2016).
[33]
Xiaojiang Peng and Cordelia Schmid. 2016. Multi-region Two-Stream R-CNN for Action Detection. In Proc. European Conference on Computer Vision. 744--759.
[34]
Xiaojiang Peng, Limin Wang, Xingxing Wang, and Yu Qiao. 2016. Bag of Visual Words and Fusion Methods for Action Recognition: Comprehensive Study and Good Practice. Computer Vision and Image Understanding 150 (2016), 109--125.
[35]
Jian-Jun Qiao, Xiao Wu, Jun-Yan He, Wei Li, and Qiang Peng. 2022. SWNet: A Deep Learning Based Approach for Splashed Water Detection on Road. IEEE Transactions on Intelligent Transportation Systems 23, 4 (2022), 3012--3025.
[36]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proc. Medical Image Computing and Computer-Assisted Intervention. 234--241.
[37]
Haiyang Si, Zhiqiang Zhang, and Feng Lu. 2020. Real-Time Semantic Segmentation via Multiply Spatial Fusion Network. In Proc. British Machine Vision Conference.
[38]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. International Conference on Learning Representations.
[39]
Yuhang Wang, Jing Liu, Yong Li, Junjie Yan, and Hanqing Lu. 2016. Objectness-aware Semantic Segmentation. In Proc. ACM International Conference on Multimedia. 307--311.
[40]
Peng Ying, Jin Liu, Hanqing Lu, and Songde Ma. 2015. Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation. In Proc. ACM International Conference on Multimedia. 1251--1254.
[41]
Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, and Nong Sang. 2021. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. International Journal of Computer Vision 129, 11 (2021), 3051--3068.
[42]
Changqian Yu, Jingbo Wang, Chao Peng, and et al. 2018. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In Proc. European Conference on Computer Vision. 334--349.
[43]
Changqian Yu, Jingbo Wang, Chao Peng, and et al. 2018. Learning a Discriminative Feature Network for Semantic Segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 1857--1866.
[44]
Hao Zhang, Yanbin Hao, and Chong-Wah Ngo. 2021. Token shift transformer for video classification. In Proc. ACM International Conference on Multimedia. 917--925.
[45]
Shanshan Zhang, Jian Yang, and Bernt Schiele. 2018. Occluded Pedestrian Detection Through Guided Attention in CNNs. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 6995--7003.
[46]
Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. 2018. Multi-view Image Generation from a Single-view. In Proc. ACM international conference on Multimedia. 383--391.
[47]
Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In Proc. European Conference on Computer Vision. 418--434.
[48]
Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, XiaogangWang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 6230--6239.
[49]
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, and Li Zhang. 2021. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proc. IEEE Conference on Computer Vision and Pattern Recognition. 6881--6890.

Cited By

View all
  • (2024)Deep Segmentation Techniques for Breast Cancer DiagnosisBioMedInformatics10.3390/biomedinformatics40200524:2(921-945)Online publication date: 1-Apr-2024
  • (2024)Research progress and challenges in real-time semantic segmentation for deep learningJournal of Image and Graphics10.11834/jig.23060529:5(1188-1220)Online publication date: 2024
  • (2023)Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611956(163-172)Online publication date: 26-Oct-2023
  • Show More Cited By

Index Terms

  1. Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention mechanism
    2. feature aggregation
    3. feature augmentation
    4. multi-view features
    5. real-time semantic segmentation

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep Segmentation Techniques for Breast Cancer DiagnosisBioMedInformatics10.3390/biomedinformatics40200524:2(921-945)Online publication date: 1-Apr-2024
    • (2024)Research progress and challenges in real-time semantic segmentation for deep learningJournal of Image and Graphics10.11834/jig.23060529:5(1188-1220)Online publication date: 2024
    • (2023)Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611956(163-172)Online publication date: 26-Oct-2023
    • (2023)Improving Anomaly Segmentation with Multi-Granularity Cross-Domain AlignmentProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611849(8515-8524)Online publication date: 26-Oct-2023
    • (2023)Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic SegmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611848(2544-2552)Online publication date: 26-Oct-2023
    • (2023)Human-Object-Object Interaction: Towards Human-Centric Complex Interaction DetectionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611813(2233-2242)Online publication date: 26-Oct-2023
    • (2023)Procontext: Exploring Progressive Context Transformer for TrackingICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10094971(1-5)Online publication date: 4-Jun-2023
    • (2023)Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming PerceptionICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10094855(1-5)Online publication date: 4-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media