Skip to main content

(SP)\(^2\)Net for Generalized Zero-Label Semantic Segmentation

  • Conference paper
  • First Online:
Pattern Recognition (DAGM GCPR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13024))

Included in the following conference series:

Abstract

Generalized zero-label semantic segmentation aims to make pixel-level predictions for both seen and unseen classes in an image. Prior works approach this task by leveraging semantic word embeddings to learn a semantic projection layer or generate features of unseen classes. However, those methods rely on standard segmentation networks that may not generalize well to unseen classes. To address this issue, we propose to leverage a class-agnostic segmentation prior provided by superpixels and introduce a superpixel pooling (SP-pooling) module as an intermediate layer of a segmentation network. Also, while prior works ignore the pixels of unseen classes that appear in training images, we propose to minimize the log probability of seen classes alleviating biased predictions in those ignore regions. We show that our (SP)\(^2\)Net significantly outperforms the state-of-the-art on different data splits of PASCAL VOC 2012 and PASCAL-Context benchmarks.

Y. Xian and Y. He—The majority of the work was done when Yongqin Xian and Yang He were with MPI for Informatics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    (SP)\(^2\)Net refers to (SP)\(^2\)Net with COB superpixels,unless otherwise stated.

  2. 2.

    COB based superpixels are pretrained on object boundaries on PASCAL Dataset.

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. TPAMI (2012)

    Google Scholar 

  2. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. TPAMI (2016)

    Google Scholar 

  3. Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 328–335 (2014)

    Google Scholar 

  4. Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L.: What’s the point: semantic segmentation with point supervision. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_34

    Chapter  Google Scholar 

  5. Bucher, M., Vu, T.H., Cord, M., Pérez, P.: Zero-shot semantic segmentation. In: NeurIPS (2019)

    Google Scholar 

  6. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI (2017)

    Google Scholar 

  7. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)

    Google Scholar 

  8. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal Visual Object Classes (VOC) challenge. Int. J. Comput. Vis. 88, 303–338 (2010). https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  9. Frome, A., et al.: Devise: a deep visual-semantic embedding model. In: NeurIPS (2013)

    Google Scholar 

  10. Gadde, R., Jampani, V., Kiefel, M., Kappler, D., Gehler, P.V.: Superpixel convolutional networks using bilateral inceptions. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_36

    Chapter  Google Scholar 

  11. Gu, Z., Zhou, S., Niu, L., Zhao, Z., Zhang, L.: Context-aware feature generation for zero-shot semantic segmentation. In: ACM Multimedia (2020)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  13. He, Y., Chiu, W.C., Keuper, M., Fritz, M.: Std2p: RGBD semantic segmentation using spatio-temporal data-driven pooling. In: CVPR (2017)

    Google Scholar 

  14. Jayaraman, D., Grauman, K.: Zero-shot recognition with unreliable attributes. In: NIPS (2014)

    Google Scholar 

  15. Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., Mikolov, T.: Fasttext.zip: compressing text classification models. arXiv preprint arXiv:1612.03651 (2016)

  16. Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: weakly supervised instance and semantic segmentation. In: CVPR (2017)

    Google Scholar 

  17. Kwak, S., Hong, S., Han, B.: Weakly supervised semantic segmentation using superpixel pooling network. In: AAAI (2017)

    Google Scholar 

  18. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR (2009)

    Google Scholar 

  19. Li, P., Wei, Y., Yang, Y.: Consistent structural relation learning for zero-shot segmentation. In: NeurIPS (2020)

    Google Scholar 

  20. Lin, D., Dai, J., Jia, J., He, K., Sun, J.: Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In: CVPR (2016)

    Google Scholar 

  21. Lin, D., Ji, Y., Lischinski, D., Cohen-Or, D., Huang, H.: Multi-scale context intertwining for semantic segmentation. In: ECCV (2018)

    Google Scholar 

  22. Liu, W., Rabinovich, A., Berg, A.C.: Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579 (2015)

  23. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  24. Maninis, K.K., Pont-Tuset, J., Arbeláez, P., Van Gool, L.: Convolutional oriented boundaries: From image segmentation to high-level tasks. IEEE TPAMI (2017)

    Google Scholar 

  25. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  26. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)

    Google Scholar 

  27. Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 891–898 (2014)

    Google Scholar 

  28. Romera-Paredes, B., OX, E., Torr, P.H.: An embarrassingly simple approach to zero-shot learning. In: ICML (2015)

    Google Scholar 

  29. Stutz, D., Hermans, A., Leibe, B.: Superpixels: an evaluation of the state-of-the-art. In: CVIU (2018)

    Google Scholar 

  30. Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: CVPR (2016)

    Google Scholar 

  31. Xian, Y., Choudhury, S., He, Y., Schiele, B., Akata, Z.: Semantic projection network for zero-and few-label semantic segmentation. In: CVPR, pp. 8256–8265 (2019)

    Google Scholar 

  32. Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: CVPR (2017)

    Google Scholar 

  33. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

  34. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the ERC (853489 - DEXIM) and by the DFG (2064/1 - Project number 390727645).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anurag Das .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13488 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Das, A., Xian, Y., He, Y., Schiele, B., Akata, Z. (2021). (SP)\(^2\)Net for Generalized Zero-Label Semantic Segmentation. In: Bauckhage, C., Gall, J., Schwing, A. (eds) Pattern Recognition. DAGM GCPR 2021. Lecture Notes in Computer Science(), vol 13024. Springer, Cham. https://doi.org/10.1007/978-3-030-92659-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-92659-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-92658-8

  • Online ISBN: 978-3-030-92659-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics