CM-DASN: visible-infrared cross-modality person re-identification via dynamic attention selection network

Li, Yuxin; Lu, Hu; Qin, Tingting; Tu, Juanjuan; Wu, Shengli

doi:10.1007/s00530-025-01724-6

CM-DASN: visible-infrared cross-modality person re-identification via dynamic attention selection network

Regular Paper
Published: 05 March 2025

Volume 31, article number 138, (2025)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yuxin Li¹,
Hu Lu¹,
Tingting Qin¹,
Juanjuan Tu² &
…
Shengli Wu³

151 Accesses
Explore all metrics

Abstract

Cross-modality person re-identification between RGB and IR images presents significant challenges due to substantial modality discrepancies. While existing approaches often focus on learning either modality-specific or modality-shared features, overemphasis on the former may hinder cross-modality matching, whereas the latter are more beneficial for this task. To address this challenge, we propose CM-DASN (Cross-Modality Dynamic Attention Selection Network), a novel approach based on dynamic attention optimization. The core of our method is the Dynamic Attention Selection Module (DASM), which adaptively selects the most effective combination of attention heads in the later stages of training, thereby balancing the learning of modality-shared and modality-specific features. We employ a softmax score-based feature selection mechanism to extract and enhance the most discriminative cross-modality feature representations. By alternating supervised learning of high-scoring modality-shared and modality-specific features in the later training stages, the model focuses on learning highly discriminative modality-shared features while retaining beneficial modality-specific information. Furthermore, we design a multi-stage, multi-scale cross-modality feature alignment strategy to more effectively learn cross-modality representations by aligning features of different scales in a phased, progressive manner. This approach considers both global structure and local details, thereby improving cross-modality person re-identification performance. Our method achieves higher cross-modality matching accuracy with minimal increases in model parameters and computational time. Extensive experiments on the SYSU-MM01 and RegDB datasets validate the effectiveness of our proposed framework, demonstrating that it outperforms most existing state-of-the-art approaches in terms of performance. The source code is available at https://github.com/hulu88/CM_DASN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bridging visible and infrared modalities: a dual-level joint align network for person re-identification

Article 20 January 2025

Enhancing cross-modality person re-identification through attention-guided asymmetric feature learning

Article 16 March 2025

Cross-modality person re-identification via modality-synergy alignment learning

Article 26 September 2024

Data availability

The datasets used in this study are public datasets, which can be accessed through their official websites or academic institutions.

References

Leng, Q., Ye, M., Tian, Q.: A survey of open-world person re-identification. IEEE Trans. Circuits Syst. Video Technol. 30(4), 1092–1108 (2020)
Article Google Scholar
He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: Transreid: Transformer-based object re-identification. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14993–15002 (2021)
Tan, L., Dai, P., Ji, R., Wu, Y.: Dynamic prototype mask for occluded person re-identification. In: Proceedings of the 30th ACM International Conference on Multimedia. MM ’22, pp. 531–540. Association for Computing Machinery, ??? (2022)
Zhu, K., Guo, H., Yan, T., Zhu, Y., Wang, J., Tang, M.: Pass: Part-aware self-supervised pre-training for person re-identification. In: European Conference on Computer Vision, pp. 198–214 (2022). Springer
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.: Deep learning for person re-identification: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 2872–2893 (2021)
Article Google Scholar
Lu, Y., Wu, Y., Liu, B., Zhang, T., Li, B., Chu, Q., Yu, N.: Cross-modality person re-identification with shared-specific feature transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13379–13389 (2020)
Wu, A., Zheng, W.-S., Yu, H.-X., Gong, S., Lai, J.: Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5380–5389 (2017)
Zhang, Y., Kang, Y., Zhao, S., Shen, J.: Dual-semantic consistency learning for visible-infrared person re-identification. IEEE Trans. Inf. Forensics Secur. 18, 1554–1565 (2022)
Article Google Scholar
Wu, Q., Dai, P., Chen, J., Lin, C.-W., Wu, Y., Huang, F., Zhong, B., Ji, R.: Discover cross-modality nuances for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4330–4339 (2021)
Dai, P., Ji, R., Wang, H., Wu, Q., Huang, Y.: Cross-modality person re-identification with generative adversarial training. In: IJCAI, vol. 1, p. 6 (2018)
Wang, Z., Wang, Z., Zheng, Y., Chuang, Y.-Y., Satoh, S.: Learning to reduce dual-level discrepancy for infrared-visible person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 618–626 (2019)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv: Computer Vision and Pattern Recognition,arXiv: Computer Vision and Pattern Recognition (2020)
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., Dosovitskiy, A.: Do vision transformers see like convolutional neural networks? Adv. Neural. Inf. Process. Syst. 34, 12116–12128 (2021)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2018)
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 480–496 (2018)
Wang, G., Yuan, Y., Chen, X., Li, J., Zhou, X.: Learning discriminative features with multiple granularities for person re-identification. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 274–282 (2018)
Wang, Y., Jiang, K., Lu, H., Xu, Z., Li, G., Chen, C., Geng, X.: Encoder-decoder assisted image generation for person re-identification. Multimedia Tools Appl. 81(7), 10373–10390 (2022)
Article Google Scholar
Ye, M., Wang, Z., Lan, X., Yuen, P.C.: Visible thermal person re-identification via dual-constrained top-ranking. In: IJCAI, vol. 1, p. 2 (2018)
Li, D., Wei, X., Hong, X., Gong, Y.: Infrared-visible cross-modal person re-identification with an x modality. Proc. AAAI Conf. Artif. Intell. 34, 4610–4617 (2020)
Google Scholar
Zhang, Y., Yan, Y., Lu, Y., Wang, H.: Adaptive middle modality alignment learning for visible-infrared person re-identification. International Journal of Computer Vision (2024)
Feng, J., Wu, A., Zheng, W.-S.: Shape-erased feature learning for visible-infrared person re-identification. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22752–22761 (2023)
Li, H., Li, M., Peng, Q., Wang, S., Yu, H., Wang, Z.: Correlation-guided semantic consistency network for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 34(6), 4503–4515 (2024)
Article Google Scholar
Hua, X., Cheng, K., Lu, H., Tu, J., Wang, Y., Wang, S.: Mscmnet: Multi-scale semantic correlation mining for visible-infrared person re-identification. Pattern Recogn. 159, 111090 (2025)
Article Google Scholar
Cheng, K., Geng, Q., Huang, S., Tu, J., Lu, H.: Learning shared features from specific and ambiguous descriptions for text-based person search. Multimedia Syst. 30, 94 (2024)
Article Google Scholar
Wang, X., Wu, Z., Luo, J., Wang, G.: Aligngan: Learning to align cross-modal images via conditional generative adversarial networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3658–3667 (2019)
Ye, M., Shen, J., J. Crandall, D., Shao, L., Luo, J.: Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pp. 229–247 (2020). Springer
Liu, H., Tan, X., Zhou, X.: Parameter sharing exploration and heterogeneous attention network for cross-modal re-identification. IEEE Trans. Multimedia 23, 3648–3659 (2020)
Google Scholar
Fu, C., Hu, Y., Wu, X., Shi, H., Mei, T., He, R.: Cm-nas: Cross-modality neural architecture search for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11823–11832 (2021)
Lu, H., Zou, X., Zhang, P.: Learning progressive modality-shared transformers for effective visible-infrared person re-identification. Proc. AAAI Conf. Artif. Intell. 37, 1835–1843 (2023)
Google Scholar
Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned cnn embedding for person re-identification. ACM Trans. Multimed. Comput. Commun. Appl. 1–20 (2018)
Ye, M., Lan, X., Li, J., Yuen, P.: Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Wang, G., Zhang, T., Cheng, J., Liu, S., Yang, Y., Hou, Z.: Rgb-infrared cross-modality person re-identification via joint pixel and feature alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3623–3632 (2019)
Park, H., Lee, S., Lee, J., Ham, B.: Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12046–12055 (2021)
Huang, Z., Liu, J., Li, L., Zheng, K., Zha, Z.-J.: Modality-adaptive mixup and invariant decomposition for rgb-infrared person re-identification. Proc. AAAI Conf. Artif. Intell. 36, 1034–1042 (2022)
Google Scholar
Ye, M., Chen, C., Shen, J., Shao, L.: Dynamic tri-level relation mining with attentive graph for visible infrared re-identification. IEEE Trans. Inf. Forensics Secur. 17, 386–398 (2021)
Article Google Scholar
Liu, J., Wang, J., Huang, N., Zhang, Q., Han, J.: Revisiting modality-specific feature compensation for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 32(10), 7226–7240 (2022)
Article Google Scholar
Chen, C., Ye, M., Qi, M., Wu, J., Jiang, J., Lin, C.-W.: Structure-aware positional transformer for visible-infrared person re-identification. IEEE Trans. Image Process. 31, 2352–2364 (2022)
Article Google Scholar
Zhang, Q., Lai, C., Liu, J., Huang, N., Han, J.: Fmcnet: Feature-level modality compensation for visible-infrared person re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7349–7358 (2022)
Jiang, K., Zhang, T., Liu, X., Qian, B., Zhang, Y., Wu, F.: Cross-modality transformer for visible-infrared person re-identification. In: European Conference on Computer Vision, pp. 480–496 (2022). Springer
Zhao, J., Wang, H., Zhou, Y., Yao, R., Chen, S., Saddik, A.E.: Spatial-channel enhanced transformer for visible-infrared person re-identification. IEEE Trans. Multimedia 25, 3668–3680 (2023)
Article Google Scholar
Chai, Z., Ling, Y., Luo, Z., Lin, D., Jiang, M., Li, S.: Dual-stream transformer with distribution alignment for visible-infrared person re-identification. IEEE Trans. Circuits Syst. Video Technol. 33(11), 6764–6776 (2023)
Article Google Scholar
Zhang, G., Zhang, Y., Tan, Z.: Protohpe: Prototype-guided high-frequency patch enhancement for visible-infrared person re-identification. In: Proceedings of the 31st ACM International Conference on Multimedia. MM ’23, pp. 944–954. Association for Computing Machinery, New York, NY, USA (2023)
Wei, Z., Yang, X., Wang, N., Gao, X.: Dual-adversarial representation disentanglement for visible infrared person re-identification. IEEE Trans. Inf. Forensics Secur. 19, 2186–2200 (2024)
Article Google Scholar
Yang, X., Dong, W., Li, M., Wei, Z., Wang, N., Gao, X.: Cooperative separation of modality shared-specific features for visible-infrared person re-identification. IEEE Trans. Multimedia 26, 8172–8183 (2024)
Article Google Scholar
Zhang, H., Cheng, S., Du, A.: Multi-stage auxiliary learning for visible-infrared person re-identification. IEEE Trans. Circ. Syst. Video Technol., 1–1 (2024)
Pan, H., Pei, W., Li, X., He, Z.: Unified conditional image generation for visible-infrared person re-identification. IEEE Trans. Inf. Forensics Secur. 19, 9026–9038 (2024)
Article Google Scholar
Nguyen, T.D., Hong, H.G., Kim, K.-W., Park, K.R.: Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors (Basel, Switzerland) 17 (2017)

Download references

Author information

Authors and Affiliations

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, 212013, Jiangsu, China
Yuxin Li, Hu Lu & Tingting Qin
School of Computer, Jiangsu University of Science and Technology, Zhenjiang, 212100, Jiangsu, China
Juanjuan Tu
School of Computing, Ulster University, Belfast, BT15 1ED, UK
Shengli Wu

Authors

Yuxin Li
View author publications
You can also search for this author inPubMed Google Scholar
Hu Lu
View author publications
You can also search for this author inPubMed Google Scholar
Tingting Qin
View author publications
You can also search for this author inPubMed Google Scholar
Juanjuan Tu
View author publications
You can also search for this author inPubMed Google Scholar
Shengli Wu
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yuxin Li and Hu Lu wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hu Lu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Haojie Li.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Lu, H., Qin, T. et al. CM-DASN: visible-infrared cross-modality person re-identification via dynamic attention selection network. Multimedia Systems 31, 138 (2025). https://doi.org/10.1007/s00530-025-01724-6

Download citation

Received: 22 October 2024
Accepted: 12 February 2025
Published: 05 March 2025
DOI: https://doi.org/10.1007/s00530-025-01724-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CM-DASN: visible-infrared cross-modality person re-identification via dynamic attention selection network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Bridging visible and infrared modalities: a dual-level joint align network for person re-identification

Enhancing cross-modality person re-identification through attention-guided asymmetric feature learning

Cross-modality person re-identification via modality-synergy alignment learning

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now