Skip to main content
Log in

Interaction semantic segmentation network via progressive supervised learning

  • RESEARCH
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Semantic segmentation requires both low-level details and high-level semantics, without losing too much detail and ensuring the speed of inference. Most existing segmentation approaches leverage low- and high-level features from pre-trained models. We propose an interaction semantic segmentation network via Progressive Supervised Learning (ISSNet). Unlike a simple fusion of two sets of features, we introduce an information interaction module to embed semantics into image details, they jointly guide the response of features in an interactive way. We develop a simple yet effective boundary refinement module to provide refined boundary features for matching corresponding semantic. We introduce a progressive supervised learning strategy throughout the training level to significantly promote network performance, not architecture level. Our proposed ISSNet shows optimal inference time. We perform extensive experiments on four datasets, including Cityscapes, HazeCityscapes, RainCityscapes and CamVid. In addition to performing better in fine weather, proposed ISSNet also performs well on rainy and foggy days. We also conduct ablation study to demonstrate the role of our proposed component. Code is available at: https://github.com/Ruini94/ISSNet

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data used to support the findings of this study are freely accessible. Please refer to the following links: https://www.cityscapes-dataset.com/http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/

References

  1. Jiang, K., Wang, Z., Yi, P., et al.: Multi-scale Progressive Fusion Network for Single Image Deraining. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, pp. 8346–8355 (2020)

  2. Huynh, C., Tran, A.T., Luu, K., et al.: Progressive semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 16755–16764 (2021)

  3. Zamir, S.W., Arora, A., Khan, S., et al.: Multi-stage Progressive Image Restoration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14816–14826 (2021)

  4. Mei, K., Jiang, A., Li, J., et al.: Progressive feature fusion network for realistic image dehazing. In: 14th Asian Conference on Computer Vision (ACCV). Perth, Australia, pp. 203–215 (2018)

  5. Hang, R.L., Yang, P., Zhou, F., et al.: Multiscale progressive segmentation network for high-resolution remote sensing imagery. IEEE Trans. Geosci. Remote Sens. 60, 5412012 (2022)

    Article  Google Scholar 

  6. Ren, D., Zuo, W., Hu, Q., et al.: Progressive image deraining networks: A better and simpler baseline. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, pp. 3937–3946 (2019)

  7. Zheng, Q.H., Li, W.Q., Hu, W.H., et al.: An Interactive Image Segmentation Algorithm Based on Graph Cut. In: International Workshop on Information and Electronics Engineering (IWIEE). Harbin, China, pp. 1420–1424 (2012)

  8. Arbelaez, P., Maire, M., Fowlkes, C., et al.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)

    Article  Google Scholar 

  9. Zhang, C.J., Xue, Z., Zhu, X.B., et al.: Boosted random contextual semantic space based representation for visual recognition. Inf. Sci. 369, 160–170 (2016)

    Article  Google Scholar 

  10. Pont-Tuset, J., Arbelaez, P., Barron, J.T., et al.: Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans. Pattern Anal. Mach. Intell. 39(1), 128–140 (2017)

    Article  Google Scholar 

  11. Chen, L.C., Papandreou, G., Kokkinos, I., et al.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. Comput. Sci. 4, 357–361 (2014)

    Google Scholar 

  12. Paszke, A., Chaurasia, A., Sangpil, K., et al.: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv:160602147, (2016)

  13. Yu, F., Koltun, V., Funkhouser, T., et al.: Dilated Residual Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 636–644 (2017)

  14. Fang, Y.C., Li, Y.F., Tu, X.K., et al.: Face completion with hybrid dilated convolution. Sig. Process.-Image Commun. 80, 115664 (2020)

    Article  Google Scholar 

  15. Lin, T.Y., Dollar, P., Girshick, R., et al.: Feature Pyramid Networks for Object Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 936–944 (2017)

  16. Chen, L.C., Papandreou, G., Kokkinos, I., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  17. Wang, P.Q., Chen, P.F., Yuan, Y., et al.: Understanding Convolution for Semantic Segmentation. In: IEEE Winter Conference on Applications of Computer Vision (WACV). NV, USA, pp. 1451–1460 (2018)

  18. Junjun, H., Zhongying, D., Lei, Z., et al.: Adaptive Pyramid Context Network for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA, pp. 7511–7520 (2019)

  19. Zhao, H.S., Zhang, Y., Liu, S., et al.: PSANet: Point-wise Spatial Attention Network for Scene Parsing. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 270–286 (2018)

  20. Lin, G.S., Milan, A., Shen, C.H., et al.: RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 5168–5177 (2017)

  21. Zhao, H.S., Shi, J.P., Qi, X.J., et al.: Pyramid Scene Parsing Network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, pp. 6230–6239 (2017)

  22. Yu, C.Q., Wang, J.B., Peng, C., et al.: Learning a Discriminative Feature Network for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 1857–1866 (2018)

  23. Salvador, A., Bellver, M., Campos, V., et al.: Recurrent Neural Networks for Semantic Instance Segmentation. arXiv:171200617 (2017)

  24. Visin, F., Romero, A., Cho, K., et al.: ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA, pp. 426–433 (2016)

  25. Liang, X.D., Shen, X.H., Feng, J.S., et al.: Semantic Object Parsing with Graph LSTM. In: Proceedings of the European Conference on Computer Vision (ECCV). Amsterdam, Netherlands, pp. : 125–143 (2016)

  26. Zheng, S., Jayasumana, S., Romera-Paredes, B., et al.: Conditional Random Fields as Recurrent Neural Networks. In: Proceedings of the International Conference on Computer Vision (ICCV). Santiago, Chile, pp. 1529–1537 (2015)

  27. Tritrong, N., Rewatbowornwong, P., Suwajanakorn, S., et al.: Repurposing GANs for One-shot Semantic Part Segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4473–4483 (2021)

  28. Xue, Y., Xu, T., Zhang, H., et al.: SegAN: adversarial network with multi-scale L (1) loss for medical image segmentation. Neuroinformatics 16(3–4), 383–392 (2018)

    Article  Google Scholar 

  29. Souly, N., Spampinato, C., Shah, M., et al.: Semi Supervised Semantic Segmentation Using Generative Adversarial Network. In: Proceedings of the International Conference on Computer Vision (ICCV). Venice, Italy, pp. 5689–5697 (2017)

  30. Zeng Shun, Z., Yulong, W., Ke, L., et al.: Semantic Segmentation by Improved Generative Adversarial Networks. arXiv:210409917 (2021)

  31. Chen, L.-C., Zhu, Y., Papandreou, G., et al.: Encoder-decoder with Atrous Separable Convolution for Semantic Image Segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 833–851 (2018)

  32. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation Networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 7132–7141 (2018)

  33. Cordts, M., Omran, M., Ramos, S., et al.: The Cityscapes Dataset for Semantic Urban Scene Understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA, pp. 3213–3223 (2016)

  34. Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vision 126(9), 973–992 (2018)

    Article  Google Scholar 

  35. Hu, X., Fu, C.-W., Zhu, L., et al.: Depth-attentional Features for Single-image Rain Removal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8022–8031 (2019)

  36. Brostow, G.J., Shotton, J., Fauqueur, J., et al.: Segmentation and Recognition Using Structure from Motion Point Clouds. In: 10th European Conference on Computer Vision (ECCV 2008). Marseille, FRANCE, pp. 44 (2008)

  37. Mehta, S., Rastegari, M., Caspi, A., et al.: ESPNet: Efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany, pp. 552–568 (2018)

  38. Wu, T.Y., Tang, S., Zhang, R., et al.: CGNet: a light-weight context guided network for semantic segmentatin. IEEE Trans. Image Process. 30, 1169–1179 (2021)

    Article  Google Scholar 

  39. Li, G., Yun, I., Kim, J., et al.: DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation. arXiv:190711357 (2019)

  40. Hu, X.W., Zhu, L., Wang, T.Y., et al.: Single-image real-time rain removal based on depth-guided non-local features. IEEE Trans. Image Process. 30, 1759–1770 (2021)

    Article  Google Scholar 

  41. Li, C., Guo, C., Loy, C.C.: Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4225–4238 (2021)

    Google Scholar 

  42. Zhang, R., Isola, P., Efros, A.A., et al.: The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, UT, USA, pp. 586–595 (2018)

  43. Wei, C., Wang, W., Yang, W., et al.: Deep Retinex Decomposition for Low-light Enhancement. arXiv:180804560 (2018)

Download references

Funding

This work was supported by National Key Research and Development Program of China under Grant 2020YFB1713300 and the Youth Innovation Promotion Association CAS (Grant No.2023419).

Author information

Authors and Affiliations

Authors

Contributions

RZ: Development and design of methodology, creation of models, implementation of the code and testing of existing algorithm, draft writing; MX: Idea, formulation and evolution of overarching research goals and aims, language modification; XF: Verification of the overall reproducibility of results, funding acquisition, visualization of data; MG: Data curation, code optimization; XS: Oversight and leadership responsibility for the research activity planning and execution, Writing revision; PZ: Formal analysis, Funding acquisition.

Corresponding author

Correspondence to Xubin Feng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, R., Xie, M., Feng, X. et al. Interaction semantic segmentation network via progressive supervised learning. Machine Vision and Applications 35, 26 (2024). https://doi.org/10.1007/s00138-023-01500-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01500-4

Keywords

Navigation