DTMIReID: Person Re-identification Based on Deformable Transformer to Incorporate Mutual Information Between Images

Yang, Han; Feng, Haodi; Cui, Xuefeng

doi:10.1007/978-3-031-78341-8_29

Han Yang¹³,
Haodi Feng¹³ &
Xuefeng Cui¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15314))

Included in the following conference series:

International Conference on Pattern Recognition

150 Accesses

Abstract

Person Re-identification (ReID) aims to retrieve a target pedestrian from an image gallery captured by cameras in varied scenarios. It is crucial for ReID to extract extensive discriminative feature representations from images for achieving desirable performance. The majority of current methods focus on mining data that can identify a pedestrian from a single image by investigating different dimensions of the image. However, a single image is sometimes insufficient to precisely characterize all the necessary features for identifying a pedestrian especially when the data quality is not guaranteed. Since a pedestrian tends to be caught in numerous images, information missed in a single image is expected to be supplemented from other images. Therefore, we consider extracting more robust feature representations benefiting from relationships between multiple pedestrian images and propose a new method DTMIReID. Firstly, we suggest a Dual Branch Attention Module (DBAM) based on Transformer to extract global and local features from single images. Then we combine the extracted features of multiple images together and input them into our proposed Deformable Transformer Module (DTM) to simultaneously fuse the global and local features from these multiple images by a Sample-Points-Based Attention (SPBA) mechanism. To the best of our knowledge, our method is the first ReID model that uses the Deformable Transformer to establish relationships between multiple features. Experimental results on four large ReID datasets show that the new method outperforms state-of-the-art published works by a large margin. DTMIReID is available at https://github.com/Titaniumyh/DTMIReID.git.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-granularity cross attention network for person re-identification

Article 06 October 2022

Mask-Guided Region Attention Network for Person Re-Identification

Discriminant Feature Learning with Self-attention for Person Re-identification

References

Zheng, W., Gong, S., and Xiang, T.: Reidentification by relative distance comparison. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 653–668 (2013). https://doi.org/10.1109/TPAMI.2012.138
Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P., and Bischof, H.: Large scale metric learning from equivalence constraints. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2288–2295. IEEE Computer Society (2012). https://doi.org/10.1109/CVPR.2012.6247939
Liao, S., and Li, Z.: Efficient PSD constrained asymmetric metric learning for person re-Identification. In: 2015 IEEE International Conference on Computer Vision(ICCV), pp. 3685–3693. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.420
Li, W., Zhu, X., and Gong, S.: Harmonious attention network for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2285–2294. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00243
Wang, C., Zhang, Q., Huang, C., Liu, W., and Wang X.: Mancs: a multi-task attentional network with curriculum sampling for person re-identificatione. In: Proceedings of the 15th European Conference on Computer Vision(ECCV), pp. 356–381. Springer (2018)
Google Scholar
Wang, Y., Chen, Z., Wu, F., Wang, G.: Person re-identification with cascaded pairwise convolutions. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1470–1478. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00159
Zhang, Z., Lan, C., Zeng, W., Jin, X., Chen, Z.: Relation-aware global attention for person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3183–3192. IEEE Computer Society (2020). https://doi.org/10.1109/CVPR42600.2020.00325
Song, C., Huang, Y., Ou Y., Wan L., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179–1188. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00129
Huang, H., Li, D., Zhang, Z., Chen, X., Huang, K.: Adversarially occluded samples for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 5098–5107. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00535
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 13001–13008. Association for the Advancement of Artifcial Intelligence (2020). https://doi.org/10.1609/aaai.v34i07.7000
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3774–3782. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.405
Liu, J., Ni, B., Yan, Y., Zhou, P., Cheng, S., Hu, J.: Pose transferrable person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4099–4108. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00431
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and A Strong Convolutional Baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 501–518. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_30
Chapter Google Scholar
Luo, H., Jiang, W., Zhang, X., Fan, X., Qian, J., Zhang, C.: AlignedReID++: dynamically matching local information for person re-identification. Pattern Recogn. 94, 53–61 (2019). https://doi.org/10.1016/j.patcog.2019.05.028
Article Google Scholar
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia(MM), pp. 274–282. Association for Computing Machineray (2018). https://doi.org/10.1145/3240508.3240552
Suh, Y., Wang, J., Tang, S., Mei, T., Lee, K.M.: Part-aligned bilinear representations for person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 418–437. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_25
Chapter Google Scholar
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3239–3248. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.349
Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: GLAD: global-local-alignment descriptor for pedestrian retrieval. In: Proceedings of the 25th ACM International Conference on Multimedia(MM), pp. 420–428. Association for Computing Machinery (2017). https://doi.org/10.1145/3123266.3123279
Zhuo, J., Chen, Z., Lai, J., Wang, G.: Occluded person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE Computer Society (2018). https://doi.org/10.1109/ICME.2018.8486568
Guo, J., Yuan, Y., Huang, L., Zhang, C., Yao, J., Han, K.: Beyond human parts: dual part-aligned representations for person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 3641–3650. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00374
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose-invariant embedding for deep person re-identification. IEEE Trans. Image Process. 28(9), 4500–4509 (2019). https://doi.org/10.1109/TIP.2019.2910414
Kalayeh, M., Basaran, E., Gokmen, M., Kamasak, M., Shah, M.: Human semantic parsing for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1062–1071. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00535
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696. IEEE Computer Society (2019). https://doi.org/10.1109/CVPR.2019.00584
Cao, Z., Hidalgo, G., Simon, T., Wei, S., and Sheikh, Y.: OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 43(1), pp. 172–186 (2021)
Google Scholar
Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7297–7306. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00762
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: transformer-based object re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 14993–15002. IEEE Computer Society (2021). https://doi.org/10.1109/ICCV48922.2021.01474
Zhu, K., et al.: AAformer: auto-aligned transformer for person re-identification. In: arXiv preprint arXiv:2104.00921. (2021)
Zhu, H., Ke, W., Li, D., Liu, J., Tian, L., Shan, Y.: Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4692–4702. IEEE Computer Society (2022)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010. Association for Computing Machineray (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale. In: 2021 International Conference on Learning Representations (ICLR), pp. 1–22. OpenReview.net (2021)
Google Scholar
Wang, H., Shen, J., Liu, Y., Gao, Y., Gavves, E.: NFormer: robust person re-identification with neighbor transformer. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7297–7307. IEEE Computer Society (2022)
Google Scholar
Zhang, G., Zhang, P., Qi, J., Lu, H.: HAT: hierarchical aggregation transformers for person re-identification. In: Proceedings of the 29th ACM International Conference on Multimedia(MM), pp. 516–525. Association for Computing Machineray (2021). https://doi.org/10.1145/3474085.3475202
Li, Y., He, J., Zhang, T., Liu, X., Zhang, Y., Wu, F.: Diverse part discovery: occluded person re-identification with part-aware transformer. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2897–2906. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00292
Zhang, Z., Zhang, H., Liu, S.: Person re-identification using heterogeneous local graph attention networks. In: Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 12136–12145. IEEE Computer Society (2021)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. In: 2021 International Conference on Learning Representations(ICLR), pp. 1–16. OpenReview.net (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1116–1124. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.133
Wei, L., Zhang, S., Gao, W., Tian, Q.: Person transfer GAN to bridge domain gap for person re-identification. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 79–88. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00016
Miao, J., Wu, Y., Liu, P., Ding, Y., Yang, Y.: Pose-guided feature alignment for occluded person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 542–551. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00063
Deng, J., Dong, W., Socher, R., Li, L., Kai L., Li, F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255. IEEE Computer Society (2009). https://doi.org/10.1109/CVPR.2009.5206848
Chen, T., et al.: ABD-Net: attentive but diverse person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 8350–8360. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00844
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3701–3711. IEEE Computer Society (2019). https://doi.org/10.1109/ICCV.2019.00380
Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Interaction-and-aggregation network for person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9309–9318. IEEE Computer Society (2019). https://doi.org/10.1109/CVPR.2019.00954
Zhuang, Z., et al.: Rethinking the distribution gap of person re-identification with camera-based batch normalization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 140–157. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_9
Chapter Google Scholar
Zhu, K., Guo, H., Liu, Z., Tang, M., Wang, J.: Identity-guided human semantic parsing for person re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 346–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_21
Chapter Google Scholar
Wang, G., et al.: High-order information matters: learning relation and topology for occluded person re-identification. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 6448–6457. IEEE Computer Society (2020). https://doi.org/10.1109/CVPR42600.2020.00648
Li, H., Wu, G., Zheng, W.: Combined depth space based architecture search for person re-identification. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 6725–6734. IEEE Computer Society (2021). https://doi.org/10.1109/CVPR46437.2021.00666
Wang, Z., Zhu, F., Tang, S., Zhao, R., He, L., Song, J.: Feature erasing and diffusion network for occluded person re-identification. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 4744–4753. IEEE Computer Society (2022). https://doi.org/10.1109/CVPR52688.2022.00471
Ye, Y., et al.: Dynamic feature pruning and consolidation for occluded person re-identification. In: Proceedings of the 2024 AAAI Conference on Artificial Intelligence, vol. 38, no. 7, pp. 6684–6692. Association for the Advancement of Artifcial Intelligence (2024). https://doi.org/10.1609/aaai.v38i7.28491
Zhai, Y., Zeng, Y., Huang, Z. ., Qin, Z., Jin, X., Cao, D.: Multi-prompts learning with cross-modal alignment for attribute-based person re-identification. In: Proceedings of the 2024 AAAI Conference on Artificial Intelligence, vol. 38, no. 7, pp. 6979–6987. Association for the Advancement of Artifcial Intelligence (2024). https://doi.org/10.1609/aaai.v38i7.28524
Dou Z., Wang Z., Li Y., Wang S.: Identity-seeking self-supervised representation learning for generalizable person re-identiffcation. In: Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision(ICCV), pp. 15847–15858. IEEE Computer Society (2023). arXiv:2308.08887
Li W., et al.: DC-Former: diverse and compact transformer for person re-identification. In: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 1415–1423. Association for the Advancement of Artifcial Intelligence (2023). https://doi.org/10.1609/aaai.v37i2.25226
Li S., Sun L., Li Q.: CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels. In: Proceedings of the 2023 AAAI Conference on Artificial Intelligence, vol. 37, no. 1, pp. 1405–1413. Association for the Advancement of Artifcial Intelligence (2023). https://doi.org/10.1609/aaai.v37i1.25225
Chen W., et al.: Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 15050–15061. IEEE Computer Society (2023)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under No. 61672325. We sincerely thank the anonymous reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Shandong University, Jinan, 250100, Shandong, People’s Republic of China
Han Yang, Haodi Feng & Xuefeng Cui

Authors

Han Yang
View author publications
You can also search for this author in PubMed Google Scholar
Haodi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haodi Feng .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, H., Feng, H., Cui, X. (2025). DTMIReID: Person Re-identification Based on Deformable Transformer to Incorporate Mutual Information Between Images. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15314. Springer, Cham. https://doi.org/10.1007/978-3-031-78341-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-78341-8_29
Published: 02 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78340-1
Online ISBN: 978-3-031-78341-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

DTMIReID: Person Re-identification Based on Deformable Transformer to Incorporate Mutual Information Between Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-granularity cross attention network for person re-identification

Mask-Guided Region Attention Network for Person Re-Identification

Discriminant Feature Learning with Self-attention for Person Re-identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

DTMIReID: Person Re-identification Based on Deformable Transformer to Incorporate Mutual Information Between Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-granularity cross attention network for person re-identification

Mask-Guided Region Attention Network for Person Re-Identification

Discriminant Feature Learning with Self-attention for Person Re-identification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation