Abstract
Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded PoseTrack-ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A “prompt" typically refers to an input instruction that guides the model’s response. In vision tasks, e.g. segmentation, prompts can be any kind of data that specifies what the model should focus on [15].
- 2.
Semantic keypoints are distinct points within a person image that carry semantic information about specific body parts, typically obtained with pose estimation [19].
- 3.
Inspired by the copy-paste technique for segmentation [9].
- 4.
We use the occluded (non-cropped) images as queries.
- 5.
The prompt tokenizer is complitely removed at both training and inference.
- 6.
Occ-ReID and Partial-ReID have no train set, so Market-1501 is used for training.
References
Alahi, A., Ramanathan, V., Fei-Fei, L.: Tracking millions of humans in crowded spaces. In: Group and Crowd Behavior for Computer Vision, pp. 115–135 (2017). https://doi.org/10.1016/b978-0-12-809276-7.00007-2
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 941–951. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00103
Chen, P., et al.: Occlude them all: occlusion-aware attention network for occluded person re-ID. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11813–11822. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01162
Chen, W., et al.: Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 15050–15061. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01445
Doering, A., Chen, D., Zhang, S., Schiele, B., Gall, J.: PoseTrack21: a dataset for person search, multi-object tracking and multi-person pose tracking. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 20931–20940. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.02029
Fu, D., et al.: Unsupervised pre-training for person re-identification. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 14745–14754. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.01451
Gao, S., Wang, J., Lu, H., Liu, Z.: Pose-guided visible part matching for occluded person ReID. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 11741–11749. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.01176
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: Exceeding YOLO series in 2021. arXiv arxiv:2107.08430 (2021). https://doi.org/10.48550/arXiv.2107.08430
Ghiasi, G., et al.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 2917–2927. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00294
Giancola, S., et al.: SoccerNet 2022 challenges results. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3558545
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: IEEE/CVF Internaional Conference on Computer Vision (ICCV), pp. 14993–15002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01474
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv arxiv:1703.07737 (2017). https://doi.org/10.48550/arXiv.1703.07737
Istasse, M., Somers, V., Elancheliyan, P., De, J., Zambrano, D.: DeepSportradar-v2: a multi-sport computer vision dataset for sport understandings. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616160
Jia, M., et al.: Matching on sets: conquer occluded person re-identification without alignment. In: AAAI Conference on Artificial Intelligence, vol. 35, pp. 1673–1681. Association for the Advancement of Artificial Intelligence (AAAI) (2021). https://doi.org/10.1609/aaai.v35i2.16260
Kirillov, A., et al.: Segment anything. arXiv arxiv:2304.02643 (2023). https://doi.org/10.48550/arXiv.2304.02643
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11969–11978. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.01225
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arXiv arxiv:1504.01942 (2015). https://doi.org/10.48550/arXiv.1504.01942
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2897–2906. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00292
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, Q., Xu, Z., Bertasius, G., Niethammer, M.: SimpleClick: interactive image segmentation with simple vision transformers. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22233–22243. Institute of Electrical and Electronics Engineering (IEEE), Paris (2023). https://doi.org/10.1109/iccv51070.2023.02037
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.00986
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 1–31 (2020). https://doi.org/10.1007/s11263-020-01375-2
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Work, (CVPRW). pp. 1487–1495. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvprw.2019.00190
Ma, Z., Zhao, Y., Li, J.: Pose-guided inter- and intra-part relational transformer for occluded person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475283
Mansourian, A.M., Somers, V., De Vleeschouwer, C., Kasaei, S.: Multi-task learning for joint re-identification, team affiliation, and role classification for sports visual tracking. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616172
Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guided feature alignment for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 542–551. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00063
Peng, Y., Hou, S., Cao, C., Liu, X., Huang, Y., He, Z.: Deep learning-based occluded person re-identification: a survey. arXiv arxiv:2207.14452 (2022). https://doi.org/10.48550/arXiv.2207.14452
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Seidenschwarz, J., Brasó, G., Serrano, V.C., Elezi, I., Leal-Taixé, L.: Simple cues lead to a strong multi-object tracker. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13813–13823. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01327
Somers, V., et al.: SoccerNet game state reconstruction: end-to-end athlete tracking and identification on a minimap. In: IEEE International Conference on Computer Vision and Pattern Recognition Work, (CVPRW), CVsports, Seattle, WA, USA (2024). https://doi.org/10.48550/ARXIV.2404.11335
Somers, V., Vleeschouwer, C.D., Alahi, A.: Body part-based representation learning for occluded person re-identification. In: IEEE/CVF Winter Conference on Application on Computer Vision (WACV), pp. 1613–1623. Institute of Electrical and Electronics Engineering (IEEE), Waikoloa (2023). https://doi.org/10.1109/wacv56688.2023.00166
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.00584
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Van Zandycke, G., Somers, V., Istasse, M., Don, C.D., Zambrano, D.: DeepSportradar-v1: computer vision dataset for sports understanding with high quality annotations. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3555699
Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6448–6457. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.00648
Wang, T., Liu, H., Song, P., Guo, T., Shi, W.: Pose-guided feature disentangling for occluded person re-identification based on transformer. In: AAAI Conference on Artificial Intelligence, vol. 36, pp. 2540–2549. Association for the Advancement of Artificial Intelligence (AAAI) (2022). https://doi.org/10.1609/aaai.v36i3.20155
Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.: Feature erasing and diffusion network for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR), pp. 4744–4753. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.00471
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Yan, C., Pang, G., Jiao, J., Bai, X., Feng, X., Shen, C.: Occluded person re-identification with single-scale global representations. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11855–11864. (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01166
Yang, J., et al.: Learning to know where to see: a visibility-aware approach for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11865–11874. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01167
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2022). https://doi.org/10.1109/tpami.2021.3054775
Zhang, G., Zhang, Y., Zhang, T., Li, B., Pu, S.: PHA: patch-wise high-frequency augmentation for transformer-based person re-identification. In: IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), pp. 14133–14142. nstitute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01358
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021). https://doi.org/10.1007/s11263-021-01513-4
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.133
Zheng, W.S., Li, X., Xiang, T., Liao, S., Lai, J., Gong, S.: Partial person re-identification. In: IEEE International Conference on Computer Vision (ICCV), pp. 4678–4686. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.531
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3652–3661. Institute of Electrical and Electronics Engineering (IEEE), Honolulu (2017). https://doi.org/10.1109/cvpr.2017.389
Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.: Identity-guided human semantic parsing for person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_21
Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded person re-identification. arXiv arxiv:1804.02792 (2018). https://doi.org/10.48550/arXiv.1804.02792
Acknowledgment
This work has been funded by Sportradar, by the Walloon region project ReconnAIssance, and by the FNRS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Somers, V., Alahi, A., Vleeschouwer, C.D. (2025). Keypoint Promptable Re-Identification. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15137. Springer, Cham. https://doi.org/10.1007/978-3-031-72986-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-72986-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72985-0
Online ISBN: 978-3-031-72986-7
eBook Packages: Computer ScienceComputer Science (R0)