Keypoint Promptable Re-Identification

Somers, Vladimir; Alahi, Alexandre; Vleeschouwer, Christophe De

doi:10.1007/978-3-031-72986-7_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15137))

Included in the following conference series:

European Conference on Computer Vision

224 Accesses

Abstract

Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. While many studies have tackled occlusions caused by objects, multi-person occlusions remain less explored. In this work, we identify and address a critical challenge overlooked by previous occluded ReID methods: the Multi-Person Ambiguity (MPA) arising when multiple individuals are visible in the same bounding box, making it impossible to determine the intended ReID target among the candidates. Inspired by recent work on prompting in vision, we introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints indicating the intended target. Since promptable re-identification is an unexplored paradigm, existing ReID datasets lack the pixel-level annotations necessary for prompting. To bridge this gap and foster further research on this topic, we introduce Occluded PoseTrack-ReID, a novel ReID dataset with keypoints labels, that features strong inter-person occlusions. Furthermore, we release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches on various occluded scenarios. Our code, dataset and annotations are available at https://github.com/VlSomers/keypoint_promptable_reidentification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-supervised Siamese keypoint inference network for human pose estimation and tracking

Article 05 March 2024

Partial Person Re-identification with Alignment and Hallucination

AONet: Attentional Occlusion-Aware Network for Occluded Person Re-identification

Notes

1.
A “prompt" typically refers to an input instruction that guides the model’s response. In vision tasks, e.g. segmentation, prompts can be any kind of data that specifies what the model should focus on [15].
2.
Semantic keypoints are distinct points within a person image that carry semantic information about specific body parts, typically obtained with pose estimation [19].
3.
Inspired by the copy-paste technique for segmentation [9].
4.
We use the occluded (non-cropped) images as queries.
5.
The prompt tokenizer is complitely removed at both training and inference.
6.
Occ-ReID and Partial-ReID have no train set, so Market-1501 is used for training.

References

Alahi, A., Ramanathan, V., Fei-Fei, L.: Tracking millions of humans in crowded spaces. In: Group and Crowd Behavior for Computer Vision, pp. 115–135 (2017). https://doi.org/10.1016/b978-0-12-809276-7.00007-2
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 941–951. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00103
Chen, P., et al.: Occlude them all: occlusion-aware attention network for occluded person re-ID. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11813–11822. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01162
Chen, W., et al.: Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 15050–15061. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01445
Doering, A., Chen, D., Zhang, S., Schiele, B., Gall, J.: PoseTrack21: a dataset for person search, multi-object tracking and multi-person pose tracking. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 20931–20940. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.02029
Fu, D., et al.: Unsupervised pre-training for person re-identification. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 14745–14754. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.01451
Gao, S., Wang, J., Lu, H., Liu, Z.: Pose-guided visible part matching for occluded person ReID. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 11741–11749. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.01176
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: Exceeding YOLO series in 2021. arXiv arxiv:2107.08430 (2021). https://doi.org/10.48550/arXiv.2107.08430
Ghiasi, G., et al.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: IEEE/CVF Conference on Computer Vision Pattern Recognition (CVPR), pp. 2917–2927. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00294
Giancola, S., et al.: SoccerNet 2022 challenges results. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3558545
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: IEEE/CVF Internaional Conference on Computer Vision (ICCV), pp. 14993–15002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01474
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification. arXiv arxiv:1703.07737 (2017). https://doi.org/10.48550/arXiv.1703.07737
Istasse, M., Somers, V., Elancheliyan, P., De, J., Zambrano, D.: DeepSportradar-v2: a multi-sport computer vision dataset for sport understandings. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616160
Jia, M., et al.: Matching on sets: conquer occluded person re-identification without alignment. In: AAAI Conference on Artificial Intelligence, vol. 35, pp. 1673–1681. Association for the Advancement of Artificial Intelligence (AAAI) (2021). https://doi.org/10.1609/aaai.v35i2.16260
Kirillov, A., et al.: Segment anything. arXiv arxiv:2304.02643 (2023). https://doi.org/10.48550/arXiv.2304.02643
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11969–11978. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.01225
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: MOTChallenge 2015: towards a benchmark for multi-target tracking. arXiv arxiv:1504.01942 (2015). https://doi.org/10.48550/arXiv.1504.01942
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2897–2906. Institute of Electrical and Electronics Engineering (IEEE), Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.00292
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, Q., Xu, Z., Bertasius, G., Niethammer, M.: SimpleClick: interactive image segmentation with simple vision transformers. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22233–22243. Institute of Electrical and Electronics Engineering (IEEE), Paris (2023). https://doi.org/10.1109/iccv51070.2023.02037
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9992–10002. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.00986
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vision 1–31 (2020). https://doi.org/10.1007/s11263-020-01375-2
Luo, H., Gu, Y., Liao, X., Lai, S., Jiang, W.: Bag of tricks and a strong baseline for deep person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Work, (CVPRW). pp. 1487–1495. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvprw.2019.00190
Ma, Z., Zhao, Y., Li, J.: Pose-guided inter- and intra-part relational transformer for occluded person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia. ACM (2021). https://doi.org/10.1145/3474085.3475283
Mansourian, A.M., Somers, V., De Vleeschouwer, C., Kasaei, S.: Multi-task learning for joint re-identification, team affiliation, and role classification for sports visual tracking. In: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports. ACM (2023). https://doi.org/10.1145/3606038.3616172
Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guided feature alignment for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 542–551. Institute of Electrical and Electronics Engineering (IEEE) (2019). https://doi.org/10.1109/iccv.2019.00063
Peng, Y., Hou, S., Cao, C., Liu, X., Huang, Y., He, Z.: Deep learning-based occluded person re-identification: a survey. arXiv arxiv:2207.14452 (2022). https://doi.org/10.48550/arXiv.2207.14452
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Seidenschwarz, J., Brasó, G., Serrano, V.C., Elezi, I., Leal-Taixé, L.: Simple cues lead to a strong multi-object tracker. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13813–13823. Institute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01327
Somers, V., et al.: SoccerNet game state reconstruction: end-to-end athlete tracking and identification on a minimap. In: IEEE International Conference on Computer Vision and Pattern Recognition Work, (CVPRW), CVsports, Seattle, WA, USA (2024). https://doi.org/10.48550/ARXIV.2404.11335
Somers, V., Vleeschouwer, C.D., Alahi, A.: Body part-based representation learning for occluded person re-identification. In: IEEE/CVF Winter Conference on Application on Computer Vision (WACV), pp. 1613–1623. Institute of Electrical and Electronics Engineering (IEEE), Waikoloa (2023). https://doi.org/10.1109/wacv56688.2023.00166
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696. Institute of Electrical and Electronics Engineering (IEEE), Long Beach (2019). https://doi.org/10.1109/cvpr.2019.00584
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Van Zandycke, G., Somers, V., Istasse, M., Don, C.D., Zambrano, D.: DeepSportradar-v1: computer vision dataset for sports understanding with high quality annotations. In: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports. ACM (2022). https://doi.org/10.1145/3552437.3555699
Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6448–6457. Institute of Electrical and Electronics Engineering (IEEE), Seattle (2020). https://doi.org/10.1109/cvpr42600.2020.00648
Wang, T., Liu, H., Song, P., Guo, T., Shi, W.: Pose-guided feature disentangling for occluded person re-identification based on transformer. In: AAAI Conference on Artificial Intelligence, vol. 36, pp. 2540–2549. Association for the Advancement of Artificial Intelligence (AAAI) (2022). https://doi.org/10.1609/aaai.v36i3.20155
Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.: Feature erasing and diffusion network for occluded person re-identification. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, (CVPR), pp. 4744–4753. Institute of Electrical and Electronics Engineering (IEEE), New Orleans (2022). https://doi.org/10.1109/cvpr52688.2022.00471
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Chapter Google Scholar
Yan, C., Pang, G., Jiao, J., Bai, X., Feng, X., Shen, C.: Occluded person re-identification with single-scale global representations. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11855–11864. (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01166
Yang, J., et al.: Learning to know where to see: a visibility-aware approach for occluded person re-identification. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11865–11874. Institute of Electrical and Electronics Engineering (IEEE), Montreal (2021). https://doi.org/10.1109/iccv48922.2021.01167
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2022). https://doi.org/10.1109/tpami.2021.3054775
Article Google Scholar
Zhang, G., Zhang, Y., Zhang, T., Li, B., Pu, S.: PHA: patch-wise high-frequency augmentation for transformer-based person re-identification. In: IEEE/CVF Conference Computer Vision and Pattern Recognition (CVPR), pp. 14133–14142. nstitute of Electrical and Electronics Engineering (IEEE), Vancouver (2023). https://doi.org/10.1109/cvpr52729.2023.01358
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vision 129(11), 3069–3087 (2021). https://doi.org/10.1007/s11263-021-01513-4
Article Google Scholar
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.133
Zheng, W.S., Li, X., Xiang, T., Liao, S., Lai, J., Gong, S.: Partial person re-identification. In: IEEE International Conference on Computer Vision (ICCV), pp. 4678–4686. nstitute of Electrical and Electronics Engineering (IEEE), Santiago (2015). https://doi.org/10.1109/iccv.2015.531
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3652–3661. Institute of Electrical and Electronics Engineering (IEEE), Honolulu (2017). https://doi.org/10.1109/cvpr.2017.389
Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.: Identity-guided human semantic parsing for person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_21
Chapter Google Scholar
Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded person re-identification. arXiv arxiv:1804.02792 (2018). https://doi.org/10.48550/arXiv.1804.02792

Download references

Acknowledgment

This work has been funded by Sportradar, by the Walloon region project ReconnAIssance, and by the FNRS.

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Vladimir Somers & Alexandre Alahi
Université Catholique de Louvain (UCLouvain), Ottignies-Louvain-la-Neuve, Belgium
Vladimir Somers & Christophe De Vleeschouwer
Sportradar, St. Gallen, Switzerland
Vladimir Somers

Authors

Vladimir Somers
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Alahi
View author publications
You can also search for this author in PubMed Google Scholar
Christophe De Vleeschouwer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Somers .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 786 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Somers, V., Alahi, A., Vleeschouwer, C.D. (2025). Keypoint Promptable Re-Identification. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15137. Springer, Cham. https://doi.org/10.1007/978-3-031-72986-7_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-72986-7_13
Published: 02 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72985-0
Online ISBN: 978-3-031-72986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Keypoint Promptable Re-Identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Siamese keypoint inference network for human pose estimation and tracking

Partial Person Re-identification with Alignment and Hallucination

AONet: Attentional Occlusion-Aware Network for Occluded Person Re-identification

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 786 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Keypoint Promptable Re-Identification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Siamese keypoint inference network for human pose estimation and tracking

Partial Person Re-identification with Alignment and Hallucination

AONet: Attentional Occlusion-Aware Network for Occluded Person Re-identification

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 786 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation