Skip to main content

Keypoint Promptable Re-Identification

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded PoseTrack-ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A “prompt" typically refers to an input instruction that guides the model’s response. In vision tasks, e.g. segmentation, prompts can be any kind of data that specifies what the model should focus on [15].

  2. 2.

    Semantic keypoints are distinct points within a person image that carry semantic information about specific body parts, typically obtained with pose estimation [19].

  3. 3.

    Inspired by the copy-paste technique for segmentation [9].

  4. 4.

    We use the occluded (non-cropped) images as queries.

  5. 5.

    The prompt tokenizer is complitely removed at both training and inference.

  6. 6.

    Occ-ReID and Partial-ReID have no train set, so Market-1501 is used for training.

References

  1. Alahi, A., Ramanathan, V., Fei-Fei, L.: Tracking millions of humans in crowded spaces. In: Group and Crowd Behavior for Computer Vision, pp. 115–135 (2017). https://doi.org/10.1016/b978-0-12-809276-7.00007-2

  2. Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 941–951. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00103

  3. Chen, P., et al.: Occlude them all: occlusion-aware attention network for occluded person re-ID. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11813–11822. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01162

  4. Chen, W., et al.: Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 15050–15061. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01445

  5. Doering, A., Chen, D., Zhang, S., Schiele, B., Gall, J.: PoseTrack21: a dataset for person search, multi-object tracking and multi-person pose tracking. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 20931–20940. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.02029

  6. Fu, D., et al.: Unsupervised pre-training for person re-identification. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 14745–14754. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.01451

  7. Gao, S., Wang, J., Lu, H., Liu, Z.: Pose-guided visible part matching for occluded person ReID. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 11741–11749. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.01176

  8. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: Exceeding YOLO series in 2021. arXiv arxiv:2107.08430 (2021). https://doi.org/10.48550/arXiv.2107.08430

  9. Ghiasi, G., et al.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 2917–2927. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00294

  10. Giancola, S., et al.: SoccerNet 2022 challenges results. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3558545

  11. He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: IEEE/CVF Internaional Conference on Computer Vision (ICCV), pp. 14993–15002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01474

  12. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv arxiv:1703.07737 (2017). https://doi.org/10.48550/arXiv.1703.07737

  13. Istasse, M., Somers, V., Elancheliyan, P., De, J., Zambrano, D.: DeepSportradar-v2: a multi-sport computer vision dataset for sport understandings. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616160

  14. Jia, M., et al.: Matching on sets: conquer occluded person re-identification without alignment. In: AAAI Conference on Artificial Intelligence, vol. 35, pp. 1673–1681. Association for the Advancement of Artificial Intelligence (AAAI) (2021). https://doi.org/10.1609/aaai.v35i2.16260

  15. Kirillov, A., et al.: Segment anything. arXiv arxiv:2304.02643 (2023). https://doi.org/10.48550/arXiv.2304.02643

  16. Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11969–11978. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.01225

  17. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arXiv arxiv:1504.01942 (2015). https://doi.org/10.48550/arXiv.1504.01942

  18. Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2897–2906. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00292

  19. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  20. Liu, Q., Xu, Z., Bertasius, G., Niethammer, M.: SimpleClick: interactive image segmentation with simple vision transformers. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22233–22243. Institute of Electrical and Electronics Engineering (IEEE), Paris (2023). https://doi.org/10.1109/iccv51070.2023.02037

  21. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.00986

  22. Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 1–31 (2020). https://doi.org/10.1007/s11263-020-01375-2

  23. Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Work, (CVPRW). pp. 1487–1495. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvprw.2019.00190

  24. Ma, Z., Zhao, Y., Li, J.: Pose-guided inter- and intra-part relational transformer for occluded person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475283

  25. Mansourian, A.M., Somers, V., De Vleeschouwer, C., Kasaei, S.: Multi-task learning for joint re-identification, team affiliation, and role classification for sports visual tracking. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616172

  26. Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guided feature alignment for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 542–551. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00063

  27. Peng, Y., Hou, S., Cao, C., Liu, X., Huang, Y., He, Z.: Deep learning-based occluded person re-identification: a survey. arXiv arxiv:2207.14452 (2022). https://doi.org/10.48550/arXiv.2207.14452

  28. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  29. Seidenschwarz, J., Brasó, G., Serrano, V.C., Elezi, I., Leal-Taixé, L.: Simple cues lead to a strong multi-object tracker. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13813–13823. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01327

  30. Somers, V., et al.: SoccerNet game state reconstruction: end-to-end athlete tracking and identification on a minimap. In: IEEE International Conference on Computer Vision and Pattern Recognition Work, (CVPRW), CVsports, Seattle, WA, USA (2024). https://doi.org/10.48550/ARXIV.2404.11335

  31. Somers, V., Vleeschouwer, C.D., Alahi, A.: Body part-based representation learning for occluded person re-identification. In: IEEE/CVF Winter Conference on Application on Computer Vision (WACV), pp. 1613–1623. Institute of Electrical and Electronics Engineering (IEEE), Waikoloa (2023). https://doi.org/10.1109/wacv56688.2023.00166

  32. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.00584

  33. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30

  34. Van Zandycke, G., Somers, V., Istasse, M., Don, C.D., Zambrano, D.: DeepSportradar-v1: computer vision dataset for sports understanding with high quality annotations. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3555699

  35. Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6448–6457. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.00648

  36. Wang, T., Liu, H., Song, P., Guo, T., Shi, W.: Pose-guided feature disentangling for occluded person re-identification based on transformer. In: AAAI Conference on Artificial Intelligence, vol. 36, pp. 2540–2549. Association for the Advancement of Artificial Intelligence (AAAI) (2022). https://doi.org/10.1609/aaai.v36i3.20155

  37. Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.: Feature erasing and diffusion network for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR), pp. 4744–4753. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.00471

  38. Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7

    Chapter  Google Scholar 

  39. Yan, C., Pang, G., Jiao, J., Bai, X., Feng, X., Shen, C.: Occluded person re-identification with single-scale global representations. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11855–11864. (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01166

  40. Yang, J., et al.: Learning to know where to see: a visibility-aware approach for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11865–11874. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01167

  41. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2022). https://doi.org/10.1109/tpami.2021.3054775

    Article  Google Scholar 

  42. Zhang, G., Zhang, Y., Zhang, T., Li, B., Pu, S.: PHA: patch-wise high-frequency augmentation for transformer-based person re-identification. In: IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), pp. 14133–14142. nstitute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01358

  43. Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021). https://doi.org/10.1007/s11263-021-01513-4

    Article  Google Scholar 

  44. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.133

  45. Zheng, W.S., Li, X., Xiang, T., Liao, S., Lai, J., Gong, S.: Partial person re-identification. In: IEEE International Conference on Computer Vision (ICCV), pp. 4678–4686. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.531

  46. Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3652–3661. Institute of Electrical and Electronics Engineering (IEEE), Honolulu (2017). https://doi.org/10.1109/cvpr.2017.389

  47. Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.: Identity-guided human semantic parsing for person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_21

    Chapter  Google Scholar 

  48. Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded person re-identification. arXiv arxiv:1804.02792 (2018). https://doi.org/10.48550/arXiv.1804.02792

Download references

Acknowledgment

This work has been funded by Sportradar, by the Walloon region project ReconnAIssance, and by the FNRS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Somers .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 786 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Somers, V., Alahi, A., Vleeschouwer, C.D. (2025). Keypoint Promptable Re-Identification. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15137. Springer, Cham. https://doi.org/10.1007/978-3-031-72986-7_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72986-7_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72985-0

  • Online ISBN: 978-3-031-72986-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics