Abstract
Person Re-identification (ReID) aims to retrieve a target pedestrian from an image gallery captured by cameras in varied scenarios. It is crucial for ReID to extract extensive discriminative feature representations from images for achieving desirable performance. The majority of current methods focus on mining data that can identify a pedestrian from a single image by investigating different dimensions of the image. However, a single image is sometimes insufficient to precisely characterize all the necessary features for identifying a pedestrian especially when the data quality is not guaranteed. Since a pedestrian tends to be caught in numerous images, information missed in a single image is expected to be supplemented from other images. Therefore, we consider extracting more robust feature representations benefiting from relationships between multiple pedestrian images and propose a new method DTMIReID. Firstly, we suggest a Dual Branch Attention Module (DBAM) based on Transformer to extract global and local features from single images. Then we combine the extracted features of multiple images together and input them into our proposed Deformable Transformer Module (DTM) to simultaneously fuse the global and local features from these multiple images by a Sample-Points-Based Attention (SPBA) mechanism. To the best of our knowledge, our method is the first ReID model that uses the Deformable Transformer to establish relationships between multiple features. Experimental results on four large ReID datasets show that the new method outperforms state-of-the-art published works by a large margin. DTMIReID is available at https://github.com/Titaniumyh/DTMIReID.git.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zheng, W., Gong, S., and Xiang, T.: Reidentification by relative distance comparison. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 653–668 (2013). https://doi.org/10.1109/TPAMI.2012.138
Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P., and Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2288–2295. IEEE Computer Society (2012). https://doi.org/10.1109/CVPR.2012.6247939
Liao, S., and Li, Z.: Efficient PSD constrained asymmetric metric learning for person re-Identification. In: 2015 IEEE International Conference on Computer Vision(ICCV), pp. 3685–3693. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.420
Li, W., Zhu, X., and Gong, S.: Harmonious attention network for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2285–2294. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00243
Wang, C., Zhang, Q., Huang, C., Liu, W., and Wang X.: Mancs: a multi-task attentional network with curriculum sampling for person re-identificatione. In: Proceedings of the 15th European Conference on Computer Vision(ECCV), pp. 356–381. Springer (2018)
Wang, Y., Chen, Z., Wu, F., Wang, G.: Person re-identification with cascaded pairwise convolutions. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1470–1478. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00159
Zhang, Z., Lan, C., Zeng, W., Jin, X., Chen, Z.: Relation-aware global attention for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3183–3192. IEEE Computer Society (2020). https://doi.org/10.1109/CVPR42600.2020.00325
Song, C., Huang, Y., Ou Y., Wan L., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179–1188. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00129
Huang, H., Li, D., Zhang, Z., Chen, X., Huang, K.: Adversarially occluded samples for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5098–5107. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00535
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 13001–13008. Association for the Advancement of Artifcial Intelligence (2020). https://doi.org/10.1609/aaai.v34i07.7000
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3774–3782. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.405
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., Hu, J.: Pose transferrable person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4099–4108. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00431
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and A Strong Convolutional Baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Luo, H., Jiang, W., Zhang, X., Fan, X., Qian, J., Zhang, C.: AlignedReID++: dynamically matching local information for person re-identification. Pattern Recogn. 94, 53–61 (2019). https://doi.org/10.1016/j.patcog.2019.05.028
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia(MM), pp. 274–282. Association for Computing Machineray (2018). https://doi.org/10.1145/3240508.3240552
Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K.M.: Part-aligned bilinear representations for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 418–437. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_25
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3239–3248. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.349
Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: GLAD: global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM International Conference on Multimedia(MM), pp. 420–428. Association for Computing Machinery (2017). https://doi.org/10.1145/3123266.3123279
Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE Computer Society (2018). https://doi.org/10.1109/ICME.2018.8486568
Guo, J., Yuan, Y., Huang, L., Zhang, C., Yao, J., Han, K.: Beyond human parts: dual part-aligned representations for person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 3641–3650. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00374
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose-invariant embedding for deep person re-identification. IEEE Trans. Image Process. 28(9), 4500–4509 (2019). https://doi.org/10.1109/TIP.2019.2910414
Kalayeh, M., Basaran, E., Gokmen, M., Kamasak, M., Shah, M.: Human semantic parsing for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1062–1071. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00535
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696. IEEE Computer Society (2019). https://doi.org/10.1109/CVPR.2019.00584
Cao, Z., Hidalgo, G., Simon, T., Wei, S., and Sheikh, Y.: OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 43(1), pp. 172–186 (2021)
Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7297–7306. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00762
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 14993–15002. IEEE Computer Society (2021). https://doi.org/10.1109/ICCV48922.2021.01474
Zhu, K., et al.: AAformer: auto-aligned transformer for person re-identification. In: arXiv preprint arXiv:2104.00921. (2021)
Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., Shan, Y.: Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4692–4702. IEEE Computer Society (2022)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Association for Computing Machineray (2017)
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR), pp. 1–22. OpenReview.net (2021)
Wang, H., Shen, J., Liu, Y., Gao, Y., Gavves, E.: NFormer: robust person re-identification with neighbor transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7297–7307. IEEE Computer Society (2022)
Zhang, G., Zhang, P., Qi, J., Lu, H.: HAT: hierarchical aggregation transformers for person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia(MM), pp. 516–525. Association for Computing Machineray (2021). https://doi.org/10.1145/3474085.3475202
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2897–2906. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00292
Zhang, Z., Zhang, H., Liu, S.: Person re-identification using heterogeneous local graph attention networks. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 12136–12145. IEEE Computer Society (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: 2021 International Conference on Learning Representations(ICLR), pp. 1–16. OpenReview.net (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.133
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 79–88. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00016
Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guided feature alignment for occluded person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 542–551. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00063
Deng, J., Dong, W., Socher, R., Li, L., Kai L., Li, F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE Computer Society (2009). https://doi.org/10.1109/CVPR.2009.5206848
Chen, T., et al.: ABD-Net: attentive but diverse person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 8350–8360. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00844
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3701–3711. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00380
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9309–9318. IEEE Computer Society (2019). https://doi.org/10.1109/CVPR.2019.00954
Zhuang, Z., et al.: Rethinking the distribution gap of person re-identification with camera-based batch normalization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 140–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_9
Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.: Identity-guided human semantic parsing for person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_21
Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 6448–6457. IEEE Computer Society (2020). https://doi.org/10.1109/CVPR42600.2020.00648
Li, H., Wu, G., Zheng, W.: Combined depth space based architecture search for person re-identification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 6725–6734. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00666
Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.: Feature erasing and diffusion network for occluded person re-identification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4744–4753. IEEE Computer Society (2022). https://doi.org/10.1109/CVPR52688.2022.00471
Ye, Y., et al.: Dynamic feature pruning and consolidation for occluded person re-identification. In: Proceedings of the 2024 AAAI Conference on Artificial Intelligence, vol. 38, no. 7, pp. 6684–6692. Association for the Advancement of Artifcial Intelligence (2024). https://doi.org/10.1609/aaai.v38i7.28491
Zhai, Y., Zeng, Y., Huang, Z. ., Qin, Z., Jin, X., Cao, D.: Multi-prompts learning with cross-modal alignment for attribute-based person re-identification. In: Proceedings of the 2024 AAAI Conference on Artificial Intelligence, vol. 38, no. 7, pp. 6979–6987. Association for the Advancement of Artifcial Intelligence (2024). https://doi.org/10.1609/aaai.v38i7.28524
Dou Z., Wang Z., Li Y., Wang S.: Identity-seeking self-supervised representation learning for generalizable person re-identiffcation. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 15847–15858. IEEE Computer Society (2023). arXiv:2308.08887
Li W., et al.: DC-Former: diverse and compact transformer for person re-identification. In: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 1415–1423. Association for the Advancement of Artifcial Intelligence (2023). https://doi.org/10.1609/aaai.v37i2.25226
Li S., Sun L., Li Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, vol. 37, no. 1, pp. 1405–1413. Association for the Advancement of Artifcial Intelligence (2023). https://doi.org/10.1609/aaai.v37i1.25225
Chen W., et al.: Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 15050–15061. IEEE Computer Society (2023)
Acknowledgments
This work is supported by the National Natural Science Foundation of China under No. 61672325. We sincerely thank the anonymous reviewers for their valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, H., Feng, H., Cui, X. (2025). DTMIReID: Person Re-identification Based on Deformable Transformer to Incorporate Mutual Information Between Images. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-78341-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)