Skip to main content

Object-Contextual Representations for Semantic Segmentation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12351))

Abstract

In this paper, we study the context aggregation problem in semantic segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations. We empirically demonstrate our method achieves competitive performance on various benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff. Our submission “HRNet + OCR + SegFix” achieves the \({1}^{\mathrm {st}}\) place on the Cityscapes leaderboard by the ECCV 2020 submission deadline. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We use “object” to represent both “things” and “stuff” following [14, 53].

  2. 2.

    See Sect. 3.4 for more details.

  3. 3.

    Only few methods adopt multi-scale testing. For example, CNIF [61] gets the improved performance from \(56.93\%\) to \(57.74\%\).

  4. 4.

    https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md.

References

  1. Arbeláez, P., Hariharan, B., Gu, C., Gupta, S., Bourdev, L., Malik, J.: Semantic segmentation using regions and parts. In: CVPR (2012)

    Google Scholar 

  2. Caesar, H., Uijlings, J., Ferrari, V.: Region-based semantic segmentation with end-to-end training. In: ECCV (2016)

    Google Scholar 

  3. Caesar, H., Uijlings, J., Ferrari, V.: COCO-Stuff: thing and stuff classes in context. In: CVPR (2018)

    Google Scholar 

  4. Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: NIPS (2018)

    Google Scholar 

  5. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. PAMI 40(4), 834–848 (2018)

    Article  Google Scholar 

  6. Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017)

  7. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)

    Google Scholar 

  8. Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A\(\hat{2}\)-nets: double attention networks. In: NIPS (2018)

    Google Scholar 

  9. Chen, Y., Rohrbach, M., Yan, Z., Yan, S., Feng, J., Kalantidis, Y.: Graph-based global reasoning networks. arXiv:1811.12814 (2018)

  10. Cheng, B., et al.: SPGNet: semantic prediction guidance for scene parsing. In: ICCV (2019)

    Google Scholar 

  11. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  12. Ding, H., Jiang, X., Liu, A.Q., Thalmann, N.M., Wang, G.: Boundary-aware feature propagation for scene segmentation. In: ICCV (2019)

    Google Scholar 

  13. Ding, H., Jiang, X., Shuai, B., Liu, A.Q., Wang, G.: Semantic correlation promoted shape-variant context for segmentation. In: CVPR (2019)

    Google Scholar 

  14. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. PAMI 35(8), 1915–1929 (2012)

    Article  Google Scholar 

  15. Fieraru, M., Khoreva, A., Pishchulin, L., Schiele, B.: Learning to refine human pose estimation. In: CVPRW (2018)

    Google Scholar 

  16. Fu, J., Liu, J., Tian, H., Fang, Z., Lu, H.: Dual attention network for scene segmentation. arXiv:1809.02983 (2018)

  17. Fu, J., et al.: Adaptive context network for scene parsing. In: ICCV (2019)

    Google Scholar 

  18. Gidaris, S., Komodakis, N.: Detect, replace, refine: deep structured prediction for pixel wise labeling. In: CVPR (2017)

    Google Scholar 

  19. Gong, K., Liang, X., Zhang, D., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CVPR (2017)

    Google Scholar 

  20. Gould, S., Fulton, R., Koller, D.: Decomposing a scene into geometric and semantically consistent regions. In: ICCV (2009)

    Google Scholar 

  21. Gu, C., Lim, J.J., Arbelaez, P., Malik, J.: Recognition using regions. In: CVPR (2009)

    Google Scholar 

  22. He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: CVPR (2019)

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  24. Huang, L., Yuan, Y., Guo, J., Zhang, C., Chen, X., Wang, J.: Interlaced sparse self-attention for semantic segmentation. arXiv preprint arXiv:1907.12273 (2019)

  25. Huang, Y.H., Jia, X., Georgoulis, S., Tuytelaars, T., Van Gool, L.: Error correction for dense semantic image labeling. In: CVPRW (2018)

    Google Scholar 

  26. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: ICCV (2019)

    Google Scholar 

  27. Islam, M.A., Naha, S., Rochan, M., Bruce, N., Wang, Y.: Label refinement network for coarse-to-fine semantic segmentation. arXiv:1703.00551 (2017)

  28. Ke, T.W., Hwang, J.J., Liu, Z., Yu, S.X.: Adaptive affinity fields for semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11205, pp. 605–621. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_36

    Chapter  Google Scholar 

  29. Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019)

    Google Scholar 

  30. Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)

    Google Scholar 

  31. Kong, S., Fowlkes, C.C.: Recurrent scene parsing with perspective understanding in the loop. In: CVPR (2018)

    Google Scholar 

  32. Kuo, W., Angelova, A., Malik, J., Lin, T.Y.: ShapeMask: learning to segment novel objects by refining shape priors (2019)

    Google Scholar 

  33. Li, K., Hariharan, B., Malik, J.: Iterative instance segmentation. In: CVPR (2016)

    Google Scholar 

  34. Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: ICCV (2019)

    Google Scholar 

  35. Li, X., Zhang, L., You, A., Yang, M., Yang, K., Tong, Y.: Global aggregation then local distribution in fully convolutional networks. BMVC (2019)

    Google Scholar 

  36. Li, X., Liu, Z., Luo, P., Change Loy, C., Tang, X.: Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade. In: CVPR (2017)

    Google Scholar 

  37. Li, Y., Gupta, A.: Beyond grids: learning graph representations for visual recognition. In: NIPS (2018)

    Google Scholar 

  38. Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing & pose estimation network and a new benchmark. PAMI (2018)

    Google Scholar 

  39. Liang, X., Hu, Z., Zhang, H., Lin, L., Xing, E.P.: Symbolic graph reasoning meets convolutions. In: NIPS (2018)

    Google Scholar 

  40. Liang, X., Zhou, H., Xing, E.: Dynamic-structured semantic propagation network. In: CVPR (2018)

    Google Scholar 

  41. Lin, D., et al.: ZigZagNet: fusing top-down and bottom-up context for object segmentation. In: CVPR (2019)

    Google Scholar 

  42. Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  43. Liu, H., et al.: An end-to-end network for panoptic segmentation. In: CVPR (2019)

    Google Scholar 

  44. Liu, T., et al.: Devil in the details: Towards accurate single and multiple human parsing. arXiv:1809.05996 (2018)

  45. Liu, W., Rabinovich, A., Berg, A.C.: ParseNet: looking wider to see better. arXiv:1506.04579 (2015)

  46. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  47. Luo, Y., Zheng, Z., Zheng, L., Tao, G., Junqing, Y., Yang, Y.: Macro-micro adversarial network for human parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11213, pp. 424–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_26

    Chapter  Google Scholar 

  48. Mottaghi, R., et al.: The role of context for object detection and semantic segmentation in the wild. In: CVPR (2014)

    Google Scholar 

  49. Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: CVPR (2017)

    Google Scholar 

  50. Nigam, I., Huang, C., Ramanan, D.: Ensemble knowledge transfer for semantic segmentation. In: WACV (2018)

    Google Scholar 

  51. Pang, Y., Li, Y., Shen, J., Shao, L.: Towards bridging semantic gap to improve semantic segmentation. In: ICCV (2019)

    Google Scholar 

  52. Rota Bulò, S., Porzi, L., Kontschieder, P.: In-place activated batchnorm for memory-optimized training of DNNs. In: CVPR (2018)

    Google Scholar 

  53. Shetty, R., Schiele, B., Fritz, M.: Not using the car to see the sidewalk-quantifying and controlling the effects of context in classification and segmentation. In: CVPR (2019)

    Google Scholar 

  54. Sun, K., et al.: High-resolution representations for labeling pixels and regions. arXiv:1904.04514 (2019)

  55. Takikawa, T., Acuna, D., Jampani, V., Fidler, S.: Gated-SCNN: gated shape CNNs for semantic segmentation. In: ICCV (2019)

    Google Scholar 

  56. Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv:2005.10821 (2020)

  57. Tian, Z., He, T., Shen, C., Yan, Y.: Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: CVPR (2019)

    Google Scholar 

  58. Tu, Z., Bai, X.: Auto-context and its application to high-level vision tasks and 3D brain image segmentation. PAMI 32(10), 1744–1757 (2010)

    Article  Google Scholar 

  59. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV 104, 154–171 (2013)

    Article  Google Scholar 

  60. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  61. Wang, W., Zhang, Z., Qi, S., Shen, J., Pang, Y., Shao, L.: Learning compositional neural information fusion for human parsing. In: ICCV (2019)

    Google Scholar 

  62. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)

    Google Scholar 

  63. Wei, Y., Feng, J., Liang, X., Cheng, M.M., Zhao, Y., Yan, S.: Object region mining with adversarial erasing: a simple classification to semantic segmentation approach. In: CVPR (2017)

    Google Scholar 

  64. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)

  65. Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR (2019)

    Google Scholar 

  66. Xu, J., Chen, K., Lin, D.: MMSegmenation. https://github.com/open-mmlab/mmsegmentation (2020)

  67. Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: CVPR (2018)

    Google Scholar 

  68. Yang, Y., Li, H., Li, X., Zhao, Q., Wu, J., Lin, Z.: SogNet: scene overlap graph network for panoptic segmentation. arXiv:1911.07527 (2019)

  69. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

    Google Scholar 

  70. Yuan, Y., Wang, J.: OCNet: object context network for scene parsing. arXiv:1809.00916 (2018)

  71. Yuan, Y., Xie, J., Chen, X., Wang, J.: SegFix: model-agnostic boundary refinement for segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision – ECCV 2020. Lecture Notes in Computer Science, vol. 12357, pp. 489–506. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_29

    Chapter  Google Scholar 

  72. Yue, K., Sun, M., Yuan, Y., Zhou, F., Ding, E., Xu, F.: Compact generalized non-local network. In: NIPS (2018)

    Google Scholar 

  73. Zhang, F., et al.: ACFNet: attentional class feature network for semantic segmentation. In: ICCV (2019)

    Google Scholar 

  74. Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)

    Google Scholar 

  75. Zhang, H., Zhang, H., Wang, C., Xie, J.: Co-occurrent features in semantic segmentation. In: CVPR (2019)

    Google Scholar 

  76. Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.: Dual graph convolutional network for semantic segmentation. In: BMVC (2019)

    Google Scholar 

  77. Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Scale-adaptive convolutions for scene parsing. In: ICCV (2017)

    Google Scholar 

  78. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  79. Zhao, H., et al.: PSANet: point-wise spatial attention network for scene parsing. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11213, pp. 270–286. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_17

    Chapter  Google Scholar 

  80. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR (2017)

    Google Scholar 

  81. Zhu, Y., et al.: Improving semantic segmentation via video propagation and label relaxation. In: CVPR (2019)

    Google Scholar 

  82. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: ICCV (2019)

    Google Scholar 

  83. Zhu, Z., Xia, Y., Shen, W., Fishman, E., Yuille, A.: A 3D coarse-to-fine framework for volumetric medical image segmentation. In: 3DV (2018)

    Google Scholar 

Download references

Acknowledgement

This work is partially supported by Natural Science Foundation of China under contract No. 61390511, and Frontier Science Key Research Project CAS No. QYZDJ-SSW-JSC009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingdong Wang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1231 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, Y., Chen, X., Wang, J. (2020). Object-Contextual Representations for Semantic Segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12351. Springer, Cham. https://doi.org/10.1007/978-3-030-58539-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58539-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58538-9

  • Online ISBN: 978-3-030-58539-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics