Skip to main content

Open Panoramic Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Panoramic images, capturing a 360\(^\circ \) field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation (OPS), where models are trained with FoV-restricted pinhole images in the source domain in an open-vocabulary setting while evaluated with FoV-open panoramic images in the target domain, enabling the zero-shot open panoramic semantic segmentation ability of models. Moreover, we propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves zero-shot panoramic semantic segmentation performance. To further enhance the distortion-aware modeling ability from the pinhole source domain, we propose a novel data augmentation method called Random Equirectangular Projection (RERP) which is specifically designed to address object deformations in advance. Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets, WildPASS, Stanford2D3D, and Matterport3D, proves the effectiveness of our proposed OOOPS model with RERP on the OPS task, especially \({\textbf {+2.2}}\boldsymbol{\%}\) on outdoor WildPASS and \({\textbf {+2.4}}\boldsymbol{\%}\) mIoU on indoor Stanford2D3D. The source code is publicly available at OPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ai, H., Cao, Z., Zhu, J., Bai, H., Chen, Y., Wang, L.: Deep learning for omnidirectional vision: a survey and new perspectives. arXiv preprint arXiv:2205.10468 (2022)

  2. Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2D-3D-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)

  3. Athwale, A., Afrasiyabi, A., Lagüe, J., Shili, I., Ahmad, O., Lalonde, J.F.: DarSwin: distortion aware radial swin transformer. In: ICCV (2023)

    Google Scholar 

  4. Berenguel-Baeta, B., Bermudez-Cameo, J., Guerrero, J.J.: FreDSNet: joint monocular depth and semantic segmentation with fast Fourier convolutions from single panoramas. In: ICRA (2023)

    Google Scholar 

  5. Caesar, H., Uijlings, J.R.R., Ferrari, V.: COCO-stuff: thing and stuff classes in context. In: CVPR (2018)

    Google Scholar 

  6. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)

    Google Scholar 

  7. Chen, J., et al.: Exploring open-vocabulary semantic segmentation without human labels. arXiv preprint arXiv:2306.00450 (2023)

  8. Chen, X., Li, S., Lim, S.N., Torralba, A., Zhao, H.: Open-vocabulary panoptic segmentation with embedding modulation. arXiv preprint arXiv:2303.11324 (2023)

  9. Cho, S., Shin, H., Hong, S., An, S., Lee, S., Arnab, A., Seo, P.H., Kim, S.: CAT-Seg: cost aggregation for open-vocabulary semantic segmentation. In: CVPR (2023)

    Google Scholar 

  10. Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)

    Google Scholar 

  11. Dao, S.D., Shi, H., Phung, D., Cai, J.: Class enhancement losses with pseudo labels for open-vocabulary semantic segmentation. TMM (2023)

    Google Scholar 

  12. Ding, Z., Wang, J., Tu, Z.: Open-vocabulary universal image segmentation with MaskCLIP. In: ICML (2023)

    Google Scholar 

  13. Dong, B., Gu, J., Han, J., Xu, H., Zuo, W.: Towards universal vision-language omni-supervised segmentation. arXiv preprint arXiv:2303.06547 (2023)

  14. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2021)

    Google Scholar 

  15. Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR (2019)

    Google Scholar 

  16. Fu, X., et al.: PanopticNeRF-360: panoramic 3D-to-2D label transfer in urban scenes. arXiv preprint arXiv:2309.10815 (2023)

  17. Gao, S., Yang, K., Shi, H., Wang, K., Bai, J.: Review on panoramic imaging and its applications in scene understanding. TIM 71, 1–34 (2022)

    Google Scholar 

  18. Ghiasi, G., Gu, X., Cui, Y., Lin, T.Y.: Scaling open-vocabulary image segmentation with image-level labels. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13696, pp. 540–557. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_31

    Chapter  Google Scholar 

  19. Guo, J., et al.: MVP-SEG: multi-view prompt learning for open-vocabulary semantic segmentation. arXiv preprint arXiv:2304.06957 (2023)

  20. Guttikonda, S., Rambach, J.: Single frame semantic segmentation using multi-modal spherical images. In: WACV (2024)

    Google Scholar 

  21. Han, K., et al.: Global knowledge calibration for fast open-vocabulary segmentation. arXiv preprint arXiv:2303.09181 (2023)

  22. Hu, X., An, Y., Shao, C., Hu, H.: Distortion convolution module for semantic segmentation of panoramic images based on the image-forming principle. TIM 71, 1–12 (2022)

    Google Scholar 

  23. Jang, S., Na, J., Oh, D.: DaDA: distortion-aware domain adaptation for unsupervised semantic segmentation. In: NeurIPS (2022)

    Google Scholar 

  24. Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: towards complete surrounding understanding via unsupervised contrastive learning. In: IV (2021)

    Google Scholar 

  25. Jaus, A., Yang, K., Stiefelhagen, R.: Panoramic panoptic segmentation: insights into surrounding parsing for mobile agents via unsupervised contrastive learning. T-ITS 24, 4438–4453 (2023)

    Google Scholar 

  26. Jiang, Q., et al.: Minimalist and high-quality panoramic imaging with PSF-aware transformers. TIP (2024)

    Google Scholar 

  27. Jiang, Q., Shi, H., Sun, L., Gao, S., Yang, K., Wang, K.: Annular computational imaging: capture clear panoramic images through simple lens. TCI 8, 1250–1264 (2022)

    Google Scholar 

  28. Jiang, W., Wu, Y.: DFNet: semantic segmentation on panoramic images with dynamic loss weights and residual fusion block. In: ICRA (2018)

    Google Scholar 

  29. Jiayun, L., Khandelwal, S., Sigal, L., Li, B.: Plug-and-play, dense-label-free extraction of open-vocabulary semantic segmentation from vision-language models. arXiv preprint arXiv:2311.17095 (2023)

  30. Kim, J., Jeong, S., Sohn, K.: PASTS: toward effective distilling transformer for panoramic semantic segmentation. In: ICIP (2022)

    Google Scholar 

  31. Li, B., Weinberger, K.Q., Belongie, S.J., Koltun, V., Ranftl, R.: Language-driven semantic segmentation. In: ICLR (2022)

    Google Scholar 

  32. Li, J., Chen, P., Qian, S., Jia, J.: TagCLIP: improving discrimination ability of open-vocabulary semantic segmentation. arXiv preprint arXiv:2304.07547 (2023)

  33. Li, X., Wu, T., Qi, Z., Wang, G., Shan, Y., Li, X.: SGAT4PASS: spherical geometry-aware transformer for panoramic semantic segmentation. In: IJCAI (2023)

    Google Scholar 

  34. Li, Z., Zhou, Q., Zhang, X., Zhang, Y., Wang, Y., Xie, W.: Open-vocabulary object segmentation with diffusion models. In: ICCV (2023)

    Google Scholar 

  35. Liang, F., et al.: Open-vocabulary semantic segmentation with mask-adapted CLIP. In: CVPR (2023)

    Google Scholar 

  36. Ling, Z., Xing, Z., Zhou, X., Cao, M., Zhou, G.: PanoSwin: a pano-style swin transformer for panorama understanding. In: CVPR (2023)

    Google Scholar 

  37. Liu, Y., Ge, P., Liu, Q., Huang, D.: Multi-grained cross-modal alignment for learning open-vocabulary semantic segmentation from text supervision. arXiv preprint arXiv:2403.03707 (2024)

  38. Luo, H., Bao, J., Wu, Y., He, X., Li, T.: SegCLIP: patch aggregation with learnable centers for open-vocabulary semantic segmentation. In: ICML (2023)

    Google Scholar 

  39. Ma, C., Yang, Y.H., Wang, Y., Zhang, Y., Xie, W.: Open-vocabulary semantic segmentation with frozen vision-language models. In: BMVC (2022)

    Google Scholar 

  40. Ma, C., Yang, Y., Ju, C., Zhang, F., Zhang, Y., Wang, Y.: Open-vocabulary semantic segmentation via attribute decomposition-aggregation. arXiv preprint arXiv:2309.00096 (2023)

  41. Ma, C., Zhang, J., Yang, K., Roitberg, A., Stiefelhagen, R.: DensePASS: dense panoramic semantic segmentation via unsupervised domain adaptation with attention-augmented context exchange. In: ITSC (2021)

    Google Scholar 

  42. Mei, J., et al.: Waymo open dataset: panoramic video panoptic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 53–72. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_4

    Chapter  Google Scholar 

  43. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  44. Mukhoti, J., et al.: Open vocabulary semantic segmentation with patch aligned contrastive learning. In: CVPR (2023)

    Google Scholar 

  45. Oin, J., et al.: FreeSeg: unified, universal and open-vocabulary image segmentation. In: CVPR (2023)

    Google Scholar 

  46. Orhan, S., Bastanlar, Y.: Semantic segmentation of outdoor panoramic images. SIVP 16, 643–650 (2021)

    Google Scholar 

  47. Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images. In: CVPR (2019)

    Google Scholar 

  48. Poudel, R.P.K., Liwicki, S., Cipolla, R.: Fast-SCNN: fast semantic segmentation network. In: BMVC (2019)

    Google Scholar 

  49. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)

    Google Scholar 

  50. Ray, B., Jung, J., Larabi, M.: A low-complexity video encoder for equirectangular projected 360 video content. In: ICASSP (2018)

    Google Scholar 

  51. Shen, Z., Lin, C., Liao, K., Nie, L., Zheng, Z., Zhao, Y.: PanoFormer: panorama transformer for indoor 360\(^\circ \) depth estimation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13661, pp. 195–211. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19769-7_12

    Chapter  Google Scholar 

  52. Shi, H., et al.: PanoFlow: learning 360\(^\circ \) optical flow for surrounding temporal understanding. T-ITS 24, 5570–5585 (2023)

    Google Scholar 

  53. Takmaz, A., Fedele, E., Sumner, R.W., Pollefeys, M., Tombari, F., Engelmann, F.: OpenMask3D: open-vocabulary 3D instance segmentation. arXiv preprint arXiv:2306.13631 (2023)

  54. Teng, Z., et al.: 360BEV: panoramic semantic mapping for indoor bird’s-eye view. In: WACV (2024)

    Google Scholar 

  55. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  56. Wang, J., et al.: Diffusion model is secretly a training-free open vocabulary semantic segmenter. arXiv preprint arXiv:2309.02773 (2023)

  57. Wang, W., et al.: InternImage: exploring large-scale vision foundation models with deformable convolutions. In: CVPR (2023)

    Google Scholar 

  58. Wang, X., Li, S., Kallidromitis, K., Kato, Y., Kozuka, K., Darrell, T.: Hierarchical open-vocabulary universal image segmentation. arXiv preprint arXiv:2307.00764 (2023)

  59. Wei, M., Yue, X., Zhang, W., Kong, S., Liu, X., Pang, J.: OV-PARTS: towards open-vocabulary part segmentation. arXiv preprint arXiv:2310.05107 (2023)

  60. Wu, C., Zheng, J., Pfrommer, J., Beyerer, J.: Attention-based point cloud edge sampling. In: CVPR (2023)

    Google Scholar 

  61. Wysocza’nska, M., Ramamonjisoa, M., Trzci’nski, T., Siméoni, O.: CLIP-DIY: CLIP dense inference yields open-vocabulary semantic segmentation for-free. arXiv preprint arXiv:2309.14289 (2023)

  62. Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: CVPR (2022)

    Google Scholar 

  63. Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: DAT++: spatially dynamic vision transformer with deformable attention. arXiv preprint arXiv:2309.01430 (2023)

  64. Xie, B., Cao, J., Xie, J., Khan, F.S., Pang, Y.: SED: a simple encoder-decoder for open-vocabulary semantic segmentation. arXiv preprint arXiv:2311.15537 (2023)

  65. Xiong, Y., et al.: Efficient deformable ConvNets: rethinking dynamic and sparse operator for vision applications. arXiv preprint arXiv:2401.06197 (2024)

  66. Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., De Mello, S.: Open-vocabulary panoptic segmentation with text-to-image diffusion models. In: CVPR (2023)

    Google Scholar 

  67. Xu, J., et al.: Learning open-vocabulary semantic segmentation models from natural language supervision. In: CVPR (2023)

    Google Scholar 

  68. Xu, M., Zhang, Z., Wei, F., Hu, H., Bai, X.: SAN: side adapter network for open-vocabulary semantic segmentation. TPAMI (2023)

    Google Scholar 

  69. Xu, M., Zhang, Z., Wei, F., Hu, H., Bai, X.: Side adapter network for open-vocabulary semantic segmentation. In: CVPR (2023)

    Google Scholar 

  70. Xu, M., Zhang, Z., Wei, F., Lin, Y., Cao, Y., Hu, H., Bai, X.: A simple baseline for open-vocabulary semantic segmentation with pre-trained vision-language model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 736–753. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_42

    Chapter  Google Scholar 

  71. Xu, X., Xiong, T., Ding, Z., Tu, Z.: MasQCLIP for open-vocabulary universal image segmentation. In: ICCV (2023)

    Google Scholar 

  72. Yang, K., et al.: Can we PASS beyond the field of view? Panoramic annular semantic segmentation for real-world surrounding perception. In: IV (2019)

    Google Scholar 

  73. Yang, K., Hu, X., Bergasa, L.M., Romera, E., Wang, K.: PASS: panoramic annular semantic segmentation. T-ITS 21, 4171–4185 (2020)

    Google Scholar 

  74. Yang, K., Hu, X., Fang, Y., Wang, K., Stiefelhagen, R.: Omnisupervised omnidirectional semantic segmentation. T-ITS (2022)

    Google Scholar 

  75. Yang, K., Hu, X., Stiefelhagen, R.: Is context-aware CNN ready for the surroundings? Panoramic semantic segmentation in the wild. TIP 30, 1866–1881 (2021)

    Google Scholar 

  76. Yang, K., Zhang, J., Reiß, S., Hu, X., Stiefelhagen, R.: Capturing omni-range context for omnidirectional segmentation. In: CVPR (2021)

    Google Scholar 

  77. Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: DenseASPP for semantic segmentation in street scenes. In: CVPR (2018)

    Google Scholar 

  78. Yin, W., Liu, Y., Shen, C., Hengel, A.V.D., Sun, B.: The devil is in the labels: semantic segmentation from sentences. arXiv preprint arXiv:2202.02002 (2022)

  79. Yu, H., He, L., Jian, B., Feng, W., Liu, S.: PanelNet: understanding 360 indoor environment via panel representation. In: CVPR (2023)

    Google Scholar 

  80. Zhang, H., et al.: A simple framework for open-vocabulary segmentation and detection. In: ICCV (2023)

    Google Scholar 

  81. Zhang, J., Ma, C., Yang, K., Roitberg, A., Peng, K., Stiefelhagen, R.: Transfer beyond the field of view: dense panoramic semantic segmentation via unsupervised domain adaptation. T-ITS 23, 9478–9491 (2022)

    Google Scholar 

  82. Zhang, J., Yang, K., Ma, C., Reiß, S., Peng, K., Stiefelhagen, R.: Bending reality: distortion-aware transformers for adapting to panoramic semantic segmentation. In: CVPR (2022)

    Google Scholar 

  83. Zhang, J., et al.: Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation. TPAMI (2024)

    Google Scholar 

  84. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)

    Google Scholar 

  85. Zheng, J., Zhang, J., Yang, K., Peng, K., Stiefelhagen, R.: MateRobot: material recognition in wearable robotics for people with visual impairments. In: 2024 IEEE International Conference on Robotics and Automation (ICRA) (2024)

    Google Scholar 

  86. Zheng, X., Pan, T., Luo, Y., Wang, L.: Look at the neighbor: distortion-aware unsupervised domain adaptation for panoramic semantic segmentation. In: ICCV (2023)

    Google Scholar 

  87. Zheng, X., Zhu, J., Liu, Y., Cao, Z., Fu, C., Wang, L.: Both style and distortion matter: dual-path unsupervised domain adaptation for panoramic semantic segmentation. In: CVPR (2023)

    Google Scholar 

  88. Zhou, H., et al.: Rethinking evaluation metrics of open-vocabulary segmentaion. arXiv preprint arXiv:2311.03352 (2023)

  89. Zhou, Q., Liu, Y., Yu, C., Li, J., Wang, Z., Wang, F.: LMSeg: language-guided multi-dataset segmentation. In: ICLR (2023)

    Google Scholar 

  90. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets V2: more deformable, better results. In: CVPR (2019)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Ministry of Science, Research and the Arts of Baden-Württemberg (MWK) through the Cooperative Graduate School Accessibility through AI-based Assistive Technology (KATE) under Grant BW6-03, in part by BMBF through a fellowship within the IFI programme of DAAD, in part by the Helmholtz Association Initiative and Networking Fund on the HAICORE@KIT and HOREKA@KIT partition, in part by the National Key RD Program under Grant 2022YFB4701400, and in part by Hangzhou SurImage Technology Company Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaming Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15018 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, J. et al. (2025). Open Panoramic Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15097. Springer, Cham. https://doi.org/10.1007/978-3-031-72933-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72933-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72932-4

  • Online ISBN: 978-3-031-72933-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics