research-article

MCFNet: Multi-Attentional Class Feature Augmentation Network for Real-Time Scene Parsing

Authors:

Dongsheng ZhouAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 6

Article No.: 166, Pages 1 - 17

https://doi.org/10.1145/3639053

Published: 08 March 2024 Publication History

Abstract

For real-time scene parsing tasks, capturing multi-scale semantic features and performing effective feature fusion is crucial. However, many existing solutions ignore stripe-shaped things like poles, traffic lights and are so computationally expensive that cannot meet the high real-time requirements. This article presents a novel model, the Multi-Attention Class Feature Augmentation Network (MCFNet) to address this challenge. MCFNet is designed to capture long-range dependencies across different scales with low computational cost and to perform a weighted fusion of feature maps. It features the BAM (Strip Matrix Based Attention Module) for extracting strip objects in images. The BAM module replaces the conventional self-attention method using square matrices with strip matrices, which allows it to focus more on strip objects while reducing computation. Additionally, MCFNet has a parallel branch that focuses on global information based on self-attention to avoid wasting computation. The two branches are merged to enhance the performance of traditional self-attention modules. Experimental results on two mainstream datasets demonstrate the effectiveness of MCFNet. On the Camvid and Cityscapes test sets, MCFNet achieved 207.5 FPS/73.5% mIoU and 136.1 FPS/71.63% mIoU, respectively. The experiments show that MCFNet outperforms other models on the Camvid dataset and can significantly improve the performance of real-time scene parsing tasks.

References

[1]

G. Dong, Y. Yan, C. Shen, and H. Wang. 2020. Real-time high-performance semantic image segmentation of urban street scenes. IEEE Transactions on Intelligent Transportation Systems 22, 6 (2020), 3258–3274.

[2]

M. Siam, M. Gamal, M. Abdel-Razek, S. Yogamani, M. Jagersand, and H. Zhang. 2018. A comparative study of real-time semantic segmentation for autonomous driving. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition Workshops (2018) 587–597.

[3]

M. Ding, Z. Wang, B. Zhou, J. Shi, Z. Lu, and P. Luo. 2020. Every frame counts: Joint learning of video segmentation and optical flow. In Proceedings of the AAAI Conference on Artificial Intelligence 34, 07 (2020), 10713–10720.

[4]

Y. Liu, C. Shen, C. Yu, and J. Wang. 2020. Efficient semantic video segmentation with per-frame inference. In European Conference on Computer Vision, (2020) Springer, 352–368.

Digital Library

[5]

Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen, and H. Xia. 2021. End-to-end video instance segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), 8741–8750.

[6]

D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool. 2018. Towards end-to-end lane detection: An instance segmentation approach. In 2018 IEEE Intelligent Vehicles Symposium (IV), 2018: IEEE, 286–291.

Digital Library

[7]

Y. Hou, Z. Ma, C. Liu, and C. C. Loy. 2019. Learning lightweight lane detection CNNs by self attention distillation. In 2019 IEEE/CVF International Conference on Computer Vision (2019), 1013--1021.

[8]

J. Zhuang, Z. Wang, and B. Wang. 2020. Video semantic segmentation with distortion-aware feature correction. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2020), 3128--3139.

[9]

Z. Tan, B. Liu, Q. Chu, H. Zhong, Y. Wu, W. Li, and N. Yu. 2021. Real time video object segmentation in compressed domain. IEEE Transactions on Circuits and Systems for Video Technology 31, 1 (2021), 175--188.

[10]

Y. Hao, Y. Liu, Z. Wu, L. Han, Y. Chen, G. Chen, L. Chu, S. Tang, Z. Yu, Z. Chen, and B. Lai. 2021. Edgeflow: Achieving practical interactive segmentation with edge-guided flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2021, 1551–1560.

[11]

X. Sun, C. Chen, X. Wang, J. Dong, H. Zhou, and S. Chen. 2021. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2021), 2937–2948.

Digital Library

[12]

H. Li, P. Xiong, H. Fan, and J. Sun. 2019. DFANet: Deep feature aggregation for real-time semantic segmentation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 9514–9523.

[13]

H. Si, Z. Zhang, F. Lv, G. Yu, and F. Lu. 2020. Real-time semantic segmentation via multiply spatial fusion network. In Proceedings of the British Machine Vision (Virtual) Conference (2020).

[14]

M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei. 2021. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition (2021), 9716–9725.

[15]

G. Li, S. Jiang, I. Yun, J. Kim and J. Kim. 2020. Depth-wise asymmetric bottleneck with point-wise aggregation decoder for real-time semantic segmentation in urban scenes. IEEE Access 8, 1 (2020), 27495--27506.

[16]

C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. 2018. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference On Computer Vision (ECCV) (2018), 325–341.

Digital Library

[17]

J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition (2019), 3146–3154.

[18]

X. Pan, C. Ge, R. Lu, S. Song, G. Chen, Z. Huang, and G. Huang. 2022. On the integration of self-attention and convolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22). 805--815.

[19]

P. Ramachandran, N. Parmar, A. Vaswani, I. Bello, A. Levskaya, and J. Shlens. 2019. Stand-alone self-attention in vision models. Advances in Neural Information Processing Systems 32, 7(2019), 68--80.

[20]

X. Wang, R. B. Girshick, A. K. Gupta, and K. He. 2018. Non-local neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 7794–7803.

[21]

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2017), 834–848.

[22]

H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. 2017. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), 2881–2890.

[23]

X. Li, W. Wang, X. Hu, and J. Yang. 2019. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, 510–519.

[24]

X. Li, A. You, Z. Zhu, H. Zhao, M. Yang, K. Yang, S. Tan, and Y. Tong. 2020. Semantic flow for fast and accurate scene parsing. In European Conference on Computer Vision 2020: Springer, 775–793.

[25]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). 6000--6010.

[26]

Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu. 2019. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, 603–612.

[27]

D. Zhang, H. Zhang, J. Tang, M. Wang, X. Hua, and Q. Sun. 2020. Feature pyramid transformer. In European Conference on Computer Vision, (2020), Springer, 323–339.

Digital Library

[28]

S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr, and L. Zhang. 2021. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), 6881–6890.

[29]

D. Zhou, Z. Yu, E. Xie, C. Xiao, A. Anandkumar, J. Feng, and J. M. Alvarez. 2022. Understanding The robustness in vision transformers. In Proceedings of the 39th International Conference on Machine Learning (ICML). 27378--27394.

[30]

Y. Yuan, L. Huang, J. Guo, C. Zhang, X. Chen, and J. Wang. 2021. OCNet: Object context for semantic segmentation. Int. J. Comput. Vis. 129, 8 (2021), 2375--2398.

[31]

C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang. 2021. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision 129, 11 (2021), 3051–3068.

Digital Library

[32]

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), 8759–8768.

[33]

S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr. 2019. Res2net: A new multi-scale backbone architecture. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 2 (2019), 652–662.

Digital Library

[34]

H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia. 2018. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the European Conference on Computer Vision (ECCV), 405–420.

Digital Library

[35]

P. Hu, F. Perazzi, F. C. Heilbron, O. Wang, Z. Lin, K. Saenko, and S. Sclaroff. 2020. Real-time semantic segmentation with fast attention. IEEE Robotics and Automation Letters 6, 1 (2020), 263–270.

[36]

H. Zha, R. Liu, X. Yang, D. Zhou, Q. Zhang, and X. Wei. 2021. ASFNet: Adaptive multiscale segmentation fusion network for real-time semantic segmentation. Computer Animation and Virtual Worlds 32, 3-4 (2021), e2022.

[37]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770–778.

[38]

E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Arroyo. 2018. ERFNet: Efficient residual factorized ConvNet for real- time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems 19, 1 (2018), 263--272.

[39]

Q. Hou, L. Zhang, M.-M. Cheng, and J. Feng. 2020. Strip pooling: Rethinking spatial pooling for scene parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 4003–4012.

[40]

Q. Song, K. Mei, and R. Huang. 2021. AttaNet: Attention-augmented network for fast and accurate scene parsing. In AAAI (2021).

[41]

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 3213–3223.

[42]

G. J. Brostow, J. Shotton, J. Fauqueur, and R. Cipolla. 2008. Segmentation and recognition using structure from motion point clouds. In ECCV, (2008).

[43]

A. Shrivastava, A. K. Gupta, and R. B. Girshick. 2016. Training region-based object detectors with online hard example mining. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 761–769.

[44]

X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 6848–6856.

[45]

S.-Y. Lo, H.-M. Hang, S.-W. Chan, and J.-J. Lin. 2019. Efficient dense modules of asymmetric convolution for real-time semantic segmentation. Proceedings of the ACM Multimedia Asia.

Digital Library

[46]

X. Zhang, B. Du, Z. Wu, and T. Wan. 2022. LAANet: Lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Computing and Applications 34, 1 (2022), 3573--3587.

[47]

J. Liu, X. Xu, Y. Shi, C. Deng, and M. Shi. 2022. RELAXNet: Residual efficient learning and attention expected fusion network for real-time semantic segmentation. Neurocomputing 474, 1(2022), 115--127.

[48]

X.-L. Zhang, B.-C. Du, Z.-C. Luo, and K. Ma. 2022. Lightweight and efficient asymmetric network design for real-time semantic segmentation. Applied Intelligence 52, 1 (2022), 564–579.

Digital Library

[49]

G. Gao, G. Xu, J. Li, Y. Yu, H. Lu, and J. Yang. 2023. FBSNet: A fast bilateral symmetrical network for real-time semantic segmentation. In IEEE Transactions on Multimedia 25, 1 (2023), 3273--3283.

[50]

Q. Yi, G. Dai, M. Shi, Z. Huang, and A. Luo. 2023. ELANet: Effective lightweight attention-guided network for real-time semantic segmentation. Neural Processing Letters 55, 5 (2023), 6425--6442.

[51]

J. Fan, F. Wang, H. Chu, X. Hu, Y. Cheng, and B. Gao. 2023. MLFNet: Multi-level fusion network for real-time semantic segmentation of autonomous driving. IEEE Transactions on Intelligent Vehicles 8, 1 (2023), 756--767.

[52]

T. Singha, D.-S. Pham, and A. Krishna. 2023. A real-time semantic segmentation model using iteratively shared features in multiple sub-encoders. Pattern Recognit 140, 1 (2023), 109557.

[53]

T. Singha, D.-S. Pham, and A. Krishna. 2022. SDBNet: Lightweight Real-Time Semantic Segmentation Using Short-Term Dense Bottleneck, In 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 1–8.

[54]

V. Badrinarayanan, A. Kendall, and R. Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481--2495.

[55]

Y. Wang, Q. Zhou, J. Liu, J. Xiong, G. Gao, X. Wu, and L. J. Latecki. 2019. Lednet: A lightweight encoder-decoder network for real-time semantic segmentation. In 2019 IEEE International Conference on Image Processing (ICIP) (2019), 1860–1864.

Cited By

Fu LLiu YZhang YLi M(2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJISP.345038
Gao HSu YWang FLi H(2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656476
Wang SYan YHan FTian YDing YYang PLi XOkoshi TKo JLiKamWa R(2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661862
Show More Cited By

Index Terms

MCFNet: Multi-Attentional Class Feature Augmentation Network for Real-Time Scene Parsing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Real-time semantic segmentation is essential for many practical applications, which utilizes attention-based feature aggregation into lightweight structures to improve accuracy and efficiency. However, existing attention-based methods ignore 1) high-...
Lightweight multi-scale attention-guided network for real-time semantic segmentation
Abstract
The wide application of small mobile devices makes the demand for lightweight real-time semantic segmentation algorithm become more and more intense, which makes it become one of the most popular research topics in the field of computer vision. ...
Highlights
- Improves the convolution kernel by combining depth-wise convolution and residual bottleneck.
- Multi-scale feature fusion realizes the complementarity of different information.
- The combination of channel attention and pyramid ...
Real-time semantic segmentation network based on parallel atrous convolution for short-term dense concatenate and attention feature fusion
Abstract
To address the problem of incomplete segmentation of large objects and miss-segmentation of tiny objects that is universally existing in semantic segmentation algorithms, PACAMNet, a real-time segmentation network based on short-term dense ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 6

June 2024

715 pages

EISSN:1551-6865

DOI:10.1145/3613638

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2024

Online AM: 15 January 2024

Accepted: 17 December 2023

Revised: 11 October 2023

Received: 22 February 2023

Published in TOMM Volume 20, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Key Project of NSFC
Program for Innovative Research Team in University of Liaoning Province
Support Plan for Key Field Innovation Team of Dalian
Support Plan for Leading Innovation Team of Dalian University
Science and Technology Innovation Fund of Dalian
111 Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
195
Total Downloads

Downloads (Last 12 months)143
Downloads (Last 6 weeks)10

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu LLiu YZhang YLi M(2024)Network Information Security Monitoring Under Artificial Intelligence EnvironmentInternational Journal of Information Security and Privacy10.4018/IJISP.34503818:1(1-25)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJISP.345038
Gao HSu YWang FLi H(2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3656476
Wang SYan YHan FTian YDing YYang PLi XOkoshi TKo JLiKamWa R(2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661862
Xue MXu ZQiao SZheng JLi TWang YPeng D(2024)Driver intention prediction based on multi-dimensional cross-modality information interactionMultimedia Systems10.1007/s00530-024-01282-330:2Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1007/s00530-024-01282-3

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents