Interaction semantic segmentation network via progressive supervised learning

Zhao, Ruini; Xie, Meilin; Feng, Xubin; Guo, Min; Su, Xiuqin; Zhang, Ping

doi:10.1007/s00138-023-01500-4

Interaction semantic segmentation network via progressive supervised learning

RESEARCH
Published: 05 February 2024

Volume 35, article number 26, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Ruini Zhao¹,
Meilin Xie¹,
Xubin Feng¹,
Min Guo¹,
Xiuqin Su¹ &
…
Ping Zhang²

189 Accesses
1 Altmetric
Explore all metrics

Abstract

Semantic segmentation requires both low-level details and high-level semantics, without losing too much detail and ensuring the speed of inference. Most existing segmentation approaches leverage low- and high-level features from pre-trained models. We propose an interaction semantic segmentation network via Progressive Supervised Learning (ISSNet). Unlike a simple fusion of two sets of features, we introduce an information interaction module to embed semantics into image details, they jointly guide the response of features in an interactive way. We develop a simple yet effective boundary refinement module to provide refined boundary features for matching corresponding semantic. We introduce a progressive supervised learning strategy throughout the training level to significantly promote network performance, not architecture level. Our proposed ISSNet shows optimal inference time. We perform extensive experiments on four datasets, including Cityscapes, HazeCityscapes, RainCityscapes and CamVid. In addition to performing better in fine weather, proposed ISSNet also performs well on rainy and foggy days. We also conduct ablation study to demonstrate the role of our proposed component. Code is available at: https://github.com/Ruini94/ISSNet

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

Article 27 November 2023

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation

Article 03 September 2021

Data availability

The data used to support the findings of this study are freely accessible. Please refer to the following links: https://www.cityscapes-dataset.com/http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/

References

Jiang, K., Wang, Z., Yi, P., et al.: Multi-scale Progressive Fusion Network for Single Image Deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, pp. 8346–8355 (2020)
Huynh, C., Tran, A.T., Luu, K., et al.: Progressive semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 16755–16764 (2021)
Zamir, S.W., Arora, A., Khan, S., et al.: Multi-stage Progressive Image Restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14816–14826 (2021)
Mei, K., Jiang, A., Li, J., et al.: Progressive feature fusion network for realistic image dehazing. In: 14th Asian Conference on Computer Vision (ACCV). Perth, Australia, pp. 203–215 (2018)
Hang, R.L., Yang, P., Zhou, F., et al.: Multiscale progressive segmentation network for high-resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60, 5412012 (2022)
Article Google Scholar
Ren, D., Zuo, W., Hu, Q., et al.: Progressive image deraining networks: A better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, pp. 3937–3946 (2019)
Zheng, Q.H., Li, W.Q., Hu, W.H., et al.: An Interactive Image Segmentation Algorithm Based on Graph Cut. In: International Workshop on Information and Electronics Engineering (IWIEE). Harbin, China, pp. 1420–1424 (2012)
Arbelaez, P., Maire, M., Fowlkes, C., et al.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)
Article Google Scholar
Zhang, C.J., Xue, Z., Zhu, X.B., et al.: Boosted random contextual semantic space based representation for visual recognition. Inf. Sci. 369, 160–170 (2016)
Article Google Scholar
Pont-Tuset, J., Arbelaez, P., Barron, J.T., et al.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2017)
Article Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., et al.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. Comput. Sci. 4, 357–361 (2014)
Google Scholar
Paszke, A., Chaurasia, A., Sangpil, K., et al.: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv:160602147, (2016)
Yu, F., Koltun, V., Funkhouser, T., et al.: Dilated Residual Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 636–644 (2017)
Fang, Y.C., Li, Y.F., Tu, X.K., et al.: Face completion with hybrid dilated convolution. Sig. Process.-Image Commun. 80, 115664 (2020)
Article Google Scholar
Lin, T.Y., Dollar, P., Girshick, R., et al.: Feature Pyramid Networks for Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 936–944 (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Wang, P.Q., Chen, P.F., Yuan, Y., et al.: Understanding Convolution for Semantic Segmentation. In: IEEE Winter Conference on Applications of Computer Vision (WACV). NV, USA, pp. 1451–1460 (2018)
Junjun, H., Zhongying, D., Lei, Z., et al.: Adaptive Pyramid Context Network for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, pp. 7511–7520 (2019)
Zhao, H.S., Zhang, Y., Liu, S., et al.: PSANet: Point-wise Spatial Attention Network for Scene Parsing. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 270–286 (2018)
Lin, G.S., Milan, A., Shen, C.H., et al.: RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 5168–5177 (2017)
Zhao, H.S., Shi, J.P., Qi, X.J., et al.: Pyramid Scene Parsing Network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 6230–6239 (2017)
Yu, C.Q., Wang, J.B., Peng, C., et al.: Learning a Discriminative Feature Network for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 1857–1866 (2018)
Salvador, A., Bellver, M., Campos, V., et al.: Recurrent Neural Networks for Semantic Instance Segmentation. arXiv:171200617 (2017)
Visin, F., Romero, A., Cho, K., et al.: ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, pp. 426–433 (2016)
Liang, X.D., Shen, X.H., Feng, J.S., et al.: Semantic Object Parsing with Graph LSTM. In: Proceedings of the European Conference on Computer Vision (ECCV). Amsterdam, Netherlands, pp. : 125–143 (2016)
Zheng, S., Jayasumana, S., Romera-Paredes, B., et al.: Conditional Random Fields as Recurrent Neural Networks. In: Proceedings of the International Conference on Computer Vision (ICCV). Santiago, Chile, pp. 1529–1537 (2015)
Tritrong, N., Rewatbowornwong, P., Suwajanakorn, S., et al.: Repurposing GANs for One-shot Semantic Part Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4473–4483 (2021)
Xue, Y., Xu, T., Zhang, H., et al.: SegAN: adversarial network with multi-scale L (1) loss for medical image segmentation. Neuroinformatics 16(3–4), 383–392 (2018)
Article Google Scholar
Souly, N., Spampinato, C., Shah, M., et al.: Semi Supervised Semantic Segmentation Using Generative Adversarial Network. In: Proceedings of the International Conference on Computer Vision (ICCV). Venice, Italy, pp. 5689–5697 (2017)
Zeng Shun, Z., Yulong, W., Ke, L., et al.: Semantic Segmentation by Improved Generative Adversarial Networks. arXiv:210409917 (2021)
Chen, L.-C., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 833–851 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 7132–7141 (2018)
Cordts, M., Omran, M., Ramos, S., et al.: The Cityscapes Dataset for Semantic Urban Scene Understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, pp. 3213–3223 (2016)
Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vision 126(9), 973–992 (2018)
Article Google Scholar
Hu, X., Fu, C.-W., Zhu, L., et al.: Depth-attentional Features for Single-image Rain Removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8022–8031 (2019)
Brostow, G.J., Shotton, J., Fauqueur, J., et al.: Segmentation and Recognition Using Structure from Motion Point Clouds. In: 10th European Conference on Computer Vision (ECCV 2008). Marseille, FRANCE, pp. 44 (2008)
Mehta, S., Rastegari, M., Caspi, A., et al.: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 552–568 (2018)
Wu, T.Y., Tang, S., Zhang, R., et al.: CGNet: a light-weight context guided network for semantic segmentatin. IEEE Trans. Image Process. 30, 1169–1179 (2021)
Article Google Scholar
Li, G., Yun, I., Kim, J., et al.: DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation. arXiv:190711357 (2019)
Hu, X.W., Zhu, L., Wang, T.Y., et al.: Single-image real-time rain removal based on depth-guided non-local features. IEEE Trans. Image Process. 30, 1759–1770 (2021)
Article Google Scholar
Li, C., Guo, C., Loy, C.C.: Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4225–4238 (2021)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., et al.: The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 586–595 (2018)
Wei, C., Wang, W., Yang, W., et al.: Deep Retinex Decomposition for Low-light Enhancement. arXiv:180804560 (2018)

Download references

Funding

This work was supported by National Key Research and Development Program of China under Grant 2020YFB1713300 and the Youth Innovation Promotion Association CAS (Grant No.2023419).

Author information

Authors and Affiliations

Xi’an Institute of Optics and Precision Mechanics of the Chinese Academy of Sciences, Xi’an, 710119, China
Ruini Zhao, Meilin Xie, Xubin Feng, Min Guo & Xiuqin Su
Chang’an University, Xi’an, 710064, China
Ping Zhang

Authors

Ruini Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Meilin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xubin Feng
View author publications
You can also search for this author in PubMed Google Scholar
Min Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiuqin Su
View author publications
You can also search for this author in PubMed Google Scholar
Ping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RZ: Development and design of methodology, creation of models, implementation of the code and testing of existing algorithm, draft writing; MX: Idea, formulation and evolution of overarching research goals and aims, language modification; XF: Verification of the overall reproducibility of results, funding acquisition, visualization of data; MG: Data curation, code optimization; XS: Oversight and leadership responsibility for the research activity planning and execution, Writing revision; PZ: Formal analysis, Funding acquisition.

Corresponding author

Correspondence to Xubin Feng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, R., Xie, M., Feng, X. et al. Interaction semantic segmentation network via progressive supervised learning. Machine Vision and Applications 35, 26 (2024). https://doi.org/10.1007/s00138-023-01500-4

Download citation

Received: 09 August 2023
Revised: 07 December 2023
Accepted: 09 December 2023
Published: 05 February 2024
DOI: https://doi.org/10.1007/s00138-023-01500-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interaction semantic segmentation network via progressive supervised learning

Abstract

Access this article

Similar content being viewed by others

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Interaction semantic segmentation network via progressive supervised learning

Abstract

Access this article

Similar content being viewed by others

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

LDANet: the laplace-guided detail-constrained asymmetric network for real-time semantic segmentation

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation