Skip to main content

General and Task-Oriented Video Segmentation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15065))

Included in the following conference series:

Abstract

We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in alignment with the task-specific requirement. These architecture-agnostic innovations empower GvSeg to effectively address each unique task by accommodating the specific properties that characterize them. Extensive experiments on seven gold-standard benchmark datasets demonstrate that GvSeg surpasses all existing specialized/general solutions by a significant margin on four different video segmentation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhou, T., Porikli, F., Crandall, D.J., Van Gool, L., Wang, W.: A survey on deep learning technique for video segmentation. IEEE TPAMI 45(6), 7099–7122 (2022)

    Article  Google Scholar 

  2. Yang, L., Fan, Y., Xu, N.: Video instance segmentation. In: ICCV (2019)

    Google Scholar 

  3. Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: CVPR (2021)

    Google Scholar 

  4. Huang, D.A., Yu, Z., Anandkumar, A.: MinVIS: a minimal video instance segmentation framework without video-based training. In: NeurIPS (2022)

    Google Scholar 

  5. Heo, M., Hwang, S., Oh, S.W., Lee, J.Y., Kim, S.J.: VITA: video instance segmentation via object token association. In: NeurIPS (2022)

    Google Scholar 

  6. Wu, J., Jiang, Y., Bai, S., Zhang, W., Bai, X.: SeqFormer: sequential transformer for video instance segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 553–569. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_32

    Chapter  Google Scholar 

  7. Hu, P., Caba, F., Wang, O., Lin, Z., Sclaroff, S., Perazzi, F.: Temporally distributed networks for fast video semantic segmentation. In: CVPR (2020)

    Google Scholar 

  8. Paul, M., Danelljan, M., Van Gool, L., Timofte, R.: Local memory attention for fast video semantic segmentation. In: IROS (2021)

    Google Scholar 

  9. Ji, W., et al.: Multispectral video semantic segmentation: a benchmark dataset and baseline. In: CVPR (2023)

    Google Scholar 

  10. Sun, G., Liu, Y., Ding, H., Probst, T., Van Gool, L.: Coarse-to-fine feature mining for video semantic segmentation. In: CVPR (2022)

    Google Scholar 

  11. Kim, D., Woo, S., Lee, J.Y., Kweon, I.S.: Video panoptic segmentation. In: CVPR (2020)

    Google Scholar 

  12. Weber, M., et al.: Step: segmenting and tracking every pixel. In: NeurIPS (2021)

    Google Scholar 

  13. Woo, S., Kim, D., Lee, J.Y., Kweon, I.S.: Learning to associate every segment for video panoptic segmentation. In: CVPR (2021)

    Google Scholar 

  14. Liang, C., Wang, W., Zhou, T., Miao, J., Luo, Y., Yang, Y.: Local-global context aware transformer for language-guided video segmentation. IEEE TPAMI 45(8), 10055–10069 (2023)

    Article  Google Scholar 

  15. Hui, T., et al.: Language-aware spatial-temporal collaboration for referring video segmentation. IEEE TPAMI 45(7), 8646–8659 (2023)

    Google Scholar 

  16. Cheng, Y., et al.: Segment and track anything. arXiv preprint arXiv:2305.06558 (2023)

  17. Wang, W., Shen, J., Li, X., Porikli, F.: Robust video object cosegmentation. IEEE TIP 24(10), 3137–3148 (2015)

    MathSciNet  Google Scholar 

  18. Wang, W., Shen, J., Porikli, F.: Saliency-aware geodesic video object segmentation. In: CVPR (2015)

    Google Scholar 

  19. Wang, W., Shen, J., Xie, J., Porikli, F.: Super-trajectory for video segmentation. In: ICCV (2017)

    Google Scholar 

  20. Lu, X., Wang, W., Shen, J., Tai, Y.W., Crandall, D.J., Hoi, S.C.: Learning video object segmentation from unlabeled videos. In: CVPR (2020)

    Google Scholar 

  21. Lu, X., Wang, W., Danelljan, M., Zhou, T., Shen, J., Van Gool, L.: Video object segmentation with episodic graph memory networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 661–679. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_39

    Chapter  Google Scholar 

  22. Li, X., et al.: Video k-net: a simple, strong, and unified baseline for video segmentation. In: CVPR (2022)

    Google Scholar 

  23. Kim, D., et al.: TubeFormer-DeepLab: video mask transformer. In: CVPR (2022)

    Google Scholar 

  24. Choudhuri, A., Chowdhary, G., Schwing, A.G.: Context-aware relative object queries to unify video instance and panoptic segmentation. In: CVPR (2023)

    Google Scholar 

  25. Athar, A., Hermans, A., Luiten, J., Ramanan, D., Leibe, B.: TarViS: a unified approach for target-based video segmentation. In: CVPR (2023)

    Google Scholar 

  26. Li, X., et al.: Tube-link: a flexible cross tube baseline for universal video segmentation. In: ICCV (2023)

    Google Scholar 

  27. He, F., et al.: InsPro: propagating instance query and proposal for online video instance segmentation. In: NeurIPS (2022)

    Google Scholar 

  28. Heo, M., et al.: A generalized framework for video instance segmentation. In: CVPR (2023)

    Google Scholar 

  29. Qin, Z., Lu, X., Nie, X., Liu, D., Yin, Y., Wang, W.: Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA J. Automatica Sinica 10(5), 1192–1208 (2023)

    Article  Google Scholar 

  30. Adelson, E.H.: On seeing stuff: the perception of materials by humans and machines. In: Human Vision and Electronic Imaging VI (2001)

    Google Scholar 

  31. Loomis, J.M., Philbeck, J.W., Zahorik, P.: Dissociation between location and shape in visual space. J. Exp. Psychol. Hum. Percept. Perform. 28(5), 1202 (2002)

    Article  Google Scholar 

  32. Wang, W., Yang, Y., Pan, Y.: Visual knowledge in the big model era: retrospect and prospect. arXiv preprint arXiv:2404.04308 (2024)

  33. Yang, Z., Chen, G., Li, X., Wang, W., Yang, Y.: DoraemonGPT: toward understanding dynamic scenes with large language models (exemplified as a video agent). In: ICML (2024)

    Google Scholar 

  34. Wu, J., Liu, Q., Jiang, Y., Bai, S., Yuille, A., Bai, X.: In defense of online models for video instance segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 588–605. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_34

    Chapter  Google Scholar 

  35. Athar, A., et al.: BURST: a benchmark for unifying object recognition, segmentation and tracking in video. In: WACV (2023)

    Google Scholar 

  36. Qi, J., et al.: Occluded video instance segmentation: a benchmark. IJCV 130(8), 2022–2039 (2022)

    Article  Google Scholar 

  37. Miao, J., Wei, Y., Wu, Y., Liang, C., Li, G., Yang, Y.: VSPW: a large-scale dataset for video scene parsing in the wild. In: CVPR (2021)

    Google Scholar 

  38. Miao, J., et al.: Large-scale video panoptic segmentation in the wild: a benchmark. In: CVPR (2022)

    Google Scholar 

  39. Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by multi-scale foreground-background integration. IEEE TPAMI 44, 4701–4712 (2021)

    Google Scholar 

  40. Miao, J., Wei, Y., Yang, Y.: Memory aggregation networks for efficient interactive video object segmentation. In: CVPR (2020)

    Google Scholar 

  41. Wu, R., Lin, H., Qi, X., Jia, J.: Memory selection network for video propagation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 175–190. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_11

    Chapter  Google Scholar 

  42. Wang, W., Lu, X., Shen, J., Crandall, D.J., Shao, L.: Zero-shot video object segmentation via attentive graph neural networks. In: ICCV (2019)

    Google Scholar 

  43. Wang, X., Jabri, A., Efros, A.A.: Learning correspondence from the cycle-consistency of time. In: CVPR (2019)

    Google Scholar 

  44. Lu, X., Wang, W., Ma, C., Shen, J., Shao, L., Porikli, F.: See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In: CVPR (2019)

    Google Scholar 

  45. Wang, W., Shen, J., Porikli, F., Yang, R.: Semi-supervised video object segmentation with super-trajectories. IEEE TPAMI 41(4), 985–998 (2018)

    Article  Google Scholar 

  46. Seong, H., Oh, S.W., Lee, J.Y., Lee, S., Lee, S., Kim, E.: Hierarchical memory matching network for video object segmentation. In: CVPR (2021)

    Google Scholar 

  47. Mao, Y., Wang, N., Zhou, W., Li, H.: Joint inductive and transductive learning for video object segmentation. In: ICCV (2021)

    Google Scholar 

  48. Cheng, H.K., Tai, Y.W., Tang, C.K.: Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In: NeurIPS (2021)

    Google Scholar 

  49. Cheng, H.K., Schwing, A.G.: XMem: long-term video object segmentation with an Atkinson-Shiffrin memory model. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 640–658. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_37

    Chapter  Google Scholar 

  50. Li, L., Zhou, T., Wang, W., Yang, L., Li, J., Yang, Y.: Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning. In: CVPR (2022)

    Google Scholar 

  51. Park, K., Woo, S., Oh, S.W., Kweon, I.S., Lee, J.Y.: Per-clip video object segmentation. In: CVPR (2022)

    Google Scholar 

  52. Yu, Y., Yuan, J., Mittal, G., Fuxin, L., Chen, M.: BATMAN: bilateral attention transformer in motion-appearance neighboring space for video object segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 612–629. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_35

    Chapter  Google Scholar 

  53. Zhang, Y., Li, L., Wang, W., Xie, R., Song, L., Zhang, W.: Boosting video object segmentation via space-time correspondence learning. In: CVPR (2023)

    Google Scholar 

  54. Li, L., Wang, W., Zhou, T., Li, J., Yang, Y.: Unified mask embedding and correspondence learning for self-supervised video segmentation. In: CVPR (2023)

    Google Scholar 

  55. Cao, J., Anwer, R.M., Cholakkal, H., Khan, F.S., Pang, Y., Shao, L.: SipMask: spatial information preservation for fast image and video instance segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 1–18. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_1

    Chapter  Google Scholar 

  56. Liu, D., Cui, Y., Tan, W., Chen, Y.: SG-Net: spatial granularity network for one-stage video instance segmentation. In: CVPR (2021)

    Google Scholar 

  57. Yang, S., et al.: Crossover learning for fast online video instance segmentation. In: ICCV (2021)

    Google Scholar 

  58. Han, S.H., et al.: VISOLO: grid-based space-time aggregation for efficient online video instance segmentation. In: CVPR (2022)

    Google Scholar 

  59. Fang, Y., et al.: Instances as queries. In: ICCV (2021)

    Google Scholar 

  60. Zhu, F., Yang, Z., Yu, X., Yang, Y., Wei, Y.: Instance as identity: a generic online paradigm for video instance segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13689, pp. 524–540. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19818-2_30

    Chapter  Google Scholar 

  61. Li, M., Li, S., Li, L., Zhang, L.: Spatial feature calibration and temporal fusion for effective one-stage video instance segmentation. In: CVPR (2021)

    Google Scholar 

  62. Ke, L., Li, X., Danelljan, M., Tai, Y.W., Tang, C.K., Yu, F.: Prototypical cross-attention networks for multiple object tracking and segmentation. In: NeurIPS (2021)

    Google Scholar 

  63. Lin, H., Wu, R., Liu, S., Lu, J., Jia, J.: Video instance segmentation with a propose-reduce paradigm. In: ICCV (2021)

    Google Scholar 

  64. Koner, R., et al.: InstanceFormer: an online video instance segmentation framework. In: AAAI (2023)

    Google Scholar 

  65. Liu, Q., Wu, J., Jiang, Y., Bai, X., Yuille, A.L., Bai, S.: InstMove: instance motion for object-centric video segmentation. In: CVPR (2023)

    Google Scholar 

  66. Li, M., Li, S., Xiang, W., Zhang, L.: MDQE: mining discriminative query embeddings to segment occluded instances on challenging videos. In: CVPR (2023)

    Google Scholar 

  67. Athar, A., Mahadevan, S., Os̆ep, A., Leal-Taixé, L., Leibe, B.: STEm-Seg: spatio-temporal embeddings for instance segmentation in videos. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 158–177. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_10

    Chapter  Google Scholar 

  68. Wu, J., et al.: Efficient video instance segmentation via tracklet query and proposal. In: CVPR (2022)

    Google Scholar 

  69. Yang, S., et al.: Temporally efficient vision transformer for video instance segmentation. In: CVPR (2022)

    Google Scholar 

  70. Bertasius, G., Torresani, L.: Classifying, segmenting, and tracking object instances in video with mask propagation. In: CVPR (2020)

    Google Scholar 

  71. Hwang, S., Heo, M., Oh, S.W., Kim, S.J.: Video instance segmentation using inter-frame communication transformers. In: NeurIPS (2021)

    Google Scholar 

  72. Wang, W., Zhou, T., Yu, F., Dai, J., Konukoglu, E., Van Gool, L.: Exploring cross-image pixel contrast for semantic segmentation. In: ICCV (2021)

    Google Scholar 

  73. Zhou, T., Wang, W., Konukoglu, E., Van Gool, L.: Rethinking semantic segmentation: a prototype view. In: CVPR (2022)

    Google Scholar 

  74. Li, L., Wang, W., Yang, Y.: LOGICSEG: parsing visual semantics with neural logic learning and reasoning. In: ICCV (2023)

    Google Scholar 

  75. Chen, M., Zheng, Z., Yang, Y., Chua, T.S.: PiPa: pixel-and patch-wise self-supervised learning for domain adaptative semantic segmentation. In: ACM MM (2023)

    Google Scholar 

  76. Li, L., Zhou, T., Wang, W., Li, J., Yang, Y.: Deep hierarchical semantic segmentation. In: CVPR (2022)

    Google Scholar 

  77. Li, L., Wang, W., Zhou, T., Quan, R., Yang, Y.: Semantic hierarchy-aware segmentation. IEEE TPAMI 46, 2123–2138 (2023)

    Article  Google Scholar 

  78. Chen, M., Zheng, Z., Yang, Y.: Transferring to real-world layouts: a depth-aware framework for scene adaptation. arXiv preprint arXiv:2311.12682 (2023)

  79. Zhou, T., Wang, W.: Cross-image pixel contrasting for semantic segmentation. IEEE TPAMI 46, 5398–5412 (2024)

    Article  Google Scholar 

  80. Xu, Y.S., Fu, T.J., Yang, H.K., Lee, C.Y.: Dynamic video segmentation network. In: CVPR (2018)

    Google Scholar 

  81. Mahasseni, B., Todorovic, S., Fern, A.: Budget-aware deep semantic video segmentation. In: CVPR (2017)

    Google Scholar 

  82. Nilsson, D., Sminchisescu, C.: Semantic video segmentation by gated recurrent flow propagation. In: CVPR (2018)

    Google Scholar 

  83. Jain, S., Wang, X., Gonzalez, J.E.: Accel: a corrective fusion network for efficient semantic segmentation on video. In: CVPR (2019)

    Google Scholar 

  84. Liu, Y., Shen, C., Yu, C., Wang, J.: Efficient semantic video segmentation with per-frame inference. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 352–368. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_21

    Chapter  Google Scholar 

  85. Li, J., et al.: Video semantic segmentation via sparse temporal transformer. In: ACM MM (2021)

    Google Scholar 

  86. Sun, G., Liu, Y., Tang, H., Chhatkuli, A., Zhang, L., Van Gool, L.: Mining relations among cross-frame affinities for video semantic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13694, pp. 522–539. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19830-4_30

    Chapter  Google Scholar 

  87. Qiao, S., Zhu, Y., Adam, H., Yuille, A., Chen, L.C.: ViP-DeepLab: learning visual perception with depth-aware video panoptic segmentation. In: CVPR (2021)

    Google Scholar 

  88. Kreuzberg, L., Zulfikar, I.E., Mahadevan, S., Engelmann, F., Leibe, B.: 4D-stop: panoptic segmentation of 4D lidar using spatio-temporal object proposal generation and aggregation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13801, pp. 537–553. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25056-9_34

    Chapter  Google Scholar 

  89. Zhou, Y., et al.: Slot-VPS: object-centric representation learning for video panoptic segmentation. In: CVPR (2022)

    Google Scholar 

  90. Yuan, H., et al.: PolyphonicFormer: unified query learning for depth-aware video panoptic segmentation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13687, pp. 582–599. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_34

    Chapter  Google Scholar 

  91. He, J., et al.: Towards deeply unified depth-aware panoptic segmentation with bi-directional guidance learning. In: ICCV (2023)

    Google Scholar 

  92. Shin, I., et al.: Video-kMAX: a simple unified approach for online and near-online video panoptic segmentation. arXiv preprint arXiv:2304.04694 (2023)

  93. Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: CVPR (2022)

    Google Scholar 

  94. Zhang, T., et al.: DVIS: decoupled video instance segmentation framework. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1282–1291 (2023)

    Google Scholar 

  95. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  96. Cheng, B., Schwing, A., Kirillov, A.: Per-pixel classification is not all you need for semantic segmentation. In: NeurIPS (2021)

    Google Scholar 

  97. Wang, W., Liang, J.C., Liu, D.: Learning equivariant segmentation with instance-unique querying. In: NeurIPS (2022)

    Google Scholar 

  98. Ding, Y., Li, L., Wang, W., Yang, Y.: Clustering propagation for universal medical image segmentation. In: CVPR (2024)

    Google Scholar 

  99. Liang, J.C., Zhou, T., Liu, D., Wang, W.: CLUSTSEG: clustering for universal segmentation. In: ICML (2023)

    Google Scholar 

  100. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE TPAMI 24(4), 509–522 (2002)

    Article  Google Scholar 

  101. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  102. Li, Z., et al.: Panoptic SegFormer: delving deeper into panoptic segmentation with transformers. In: CVPR (2022)

    Google Scholar 

  103. Kirillov, A., Wu, Y., He, K., Girshick, R.: PointRend: image segmentation as rendering. In: CVPR (2020)

    Google Scholar 

  104. Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: SOLOv2: dynamic and fast instance segmentation. In: NeurIPS (2020)

    Google Scholar 

  105. Cheng, H.K., Oh, S.W., Price, B., Schwing, A., Lee, J.Y.: Tracking anything with decoupled video segmentation. In: ICCV (2023)

    Google Scholar 

  106. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  107. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)

    Google Scholar 

  108. Cheng, B., Choudhuri, A., Misra, I., Kirillov, A., Girdhar, R., Schwing, A.G.: Mask2former for video instance segmentation. arXiv preprint arXiv:2112.10764 (2021)

  109. Li, J., Yu, B., Rao, Y., Zhou, J., Lu, J.: TCOVIS: temporally consistent online video instance segmentation. In: ICCV (2023)

    Google Scholar 

  110. Ying, K., et al.: CTVIS: consistent training for online video instance segmentation. In: ICCV (2023)

    Google Scholar 

  111. Xu, N., et al.: Youtube-VOS: a large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)

  112. Hoffhues, A., Luiten, J.: Trackeval (2020). https://github.com/JonathonLuiten/TrackEval

  113. Yan, B., et al.: Universal instance perception as object discovery and retrieval. In: CVPR (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Yang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6535 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, M., Li, L., Wang, W., Quan, R., Yang, Y. (2025). General and Task-Oriented Video Segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15065. Springer, Cham. https://doi.org/10.1007/978-3-031-72667-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72667-5_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72666-8

  • Online ISBN: 978-3-031-72667-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics