skip to main content
10.1145/3652583.3658109acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

Published: 07 June 2024 Publication History

Abstract

The Visible-Infrared Person Re-identification (VI ReID) aims to match visible and infrared images of the same pedestrians across non-overlapped camera views. These two input modalities contain both invariant information, such as shape, and modality-specific details, such as color. An ideal model should utilize valuable information from both modalities during training for enhanced representational capability. However, the gap caused by modality-specific information poses substantial challenges for the VI ReID model to handle distinct modality inputs simultaneously. To address this, we introduce the Modality-aware and Instance-aware Visual Prompts (MIP) network in our work, designed to effectively utilize both invariant and specific information for identification. Specifically, our MIP model is built on the transformer architecture. In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification. Besides, we also employ each pedestrian feature to construct a group of instance-specific prompts. These customized prompts are responsible for guiding our model to adapt to each pedestrian instance dynamically, thereby capturing identity-level discriminative clues for identification. Through extensive experiments on SYSU-MM01 and RegDB datasets, the effectiveness of both our designed modules is evaluated. Additionally, our proposed MIP performs better than most state-of-the-art methods.

References

[1]
Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. 2022. Exploring Visual Prompts for Adapting Large-Scale Models. arXiv preprint arXiv:2203.17274 (2022).
[2]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[3]
Cuiqun Chen, Mang Ye, Meibin Qi, Jingjing Wu, Jianguo Jiang, and Chia-Wen Lin. 2022b. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Image Process., Vol. 31 (2022), 2352--2364.
[4]
Shoufa Chen, Chongjian GE, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022a. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Proc. Advances in Neural Inf. Process. Syst., Vol. 35. 16664--16678.
[5]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 1320--1329.
[6]
Yehansen Chen, Lin Wan, Zhihang Li, Qianyan Jing, and Zongyuan Sun. 2021. Neural Feature Search for RGB-Infrared Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 587--597.
[7]
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-Modality Person Re-Identification with Generative Adversarial Training. In Proc. Int. Joint Conf. Artificial Intell. 677--683.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 248--255.
[9]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[10]
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. In Proc. IEEE Int. Conf. Comp. Vis. 11803--11812.
[11]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. Advances in Neural Inf. Process. Syst., Vol. 27.
[12]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint arXiv:1703.07737 (2017).
[13]
Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, and Zheng-Jun Zha. 2022. Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification. In Proc. Conf. AAAI. 1034--1042.
[14]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. Mach. Learn. 448--456.
[15]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In Proc. Eur. Conf. Comp. Vis. Springer, 709--727.
[16]
Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Ruiqi Wu, Shizhou Zhang, Peng Wang, and Yanning Zhang. 2022a. Generalizable Person Re-Identification via Viewpoint Alignment and Fusion. arXiv preprint arXiv:2212.02398 (2022).
[17]
Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Lu Yang, Shizhou Zhang, Peng Wang, and Yanning Zhang. 2022b. Dynamically Transformed Instance Normalization Network for Generalizable Person Re-Identification. In Proc. Eur. Conf. Comp. Vis.
[18]
Bingliang Jiao, Lingqiao Liu, Liying Gao, Ruiqi Wu, Guosheng Lin, Peng Wang, and Yanning Zhang. 2023. Toward Re-Identifying Any Animal. In Proc. Advances in Neural Inf. Process. Syst.
[19]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proc. Conf. Empirical Methods in Natural Language Processing. 3045--3059.
[20]
Wenkang Li, Ke Qi, Wenbin Chen, and Yicong Zhou. 2021. Bridging the Distribution Gap of Visible-Infrared Person Re-identification with Modality Batch Normalization. In Proc. Int. Conf. on Artificial Intelligence and Computer Applications. 23--28.
[21]
Tengfei Liang, Yi Jin, Wu Liu, and Yidong Li. 2023. Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification. IEEE Trans. Multimedia (2023), 1--13.
[22]
Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, and Nicu Sebe. 2020. Class-Aware Modality Mix and Center-Guided Metric Learning for Visible-Thermal Person Re-Identification. In ACM Int. Conf. Multimedia. 889--897.
[23]
Jianan Liu, Jialiang Wang, Nianchang Huang, Qiang Zhang, and Jungong Han. 2022. Revisiting Modality-Specific Feature Compensation for Visible-Infrared Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol., Vol. 32, 10 (2022), 7226--7240.
[24]
Jialu Liu and Meng Yang. 2023. Prompt-Based Transformer for Generalizable Person Re-identification with Image Masking. In Chinese Conference on Biometric Recognition. 259--268.
[25]
Xinchen Liu, Wu Liu, Huadong Ma, and Huiyuan Fu. 2016. Large-scale vehicle re-identification in urban surveillance videos. In Proc. IEEE Int. Conf. Multimedia Expo. 1--6.
[26]
Hu Lu, Xuezhang Zou, and Pingping Zhang. 2023. Learning Progressive Modality-Shared Transformers for Effective Visible-Infrared Person Re-Identification. In Proc. Conf. AAAI. 1835--1843.
[27]
Yuning Lu, Jianzhuang Liu, Yonggang Zhang, Yajing Liu, and Xinmei Tian. 2022. Prompt Distribution Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 5196--5205.
[28]
Hao Luo, Wei Jiang, Youzhi Gu, Fuxu Liu, Xingyu Liao, Shenqi Lai, and Jianyang Gu. 2020. A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification. IEEE Trans. Multimedia, Vol. 22, 10 (2020), 2597--2609.
[29]
Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors, Vol. 17 (2017), 605.
[30]
Yuxin Peng, Jinwei Qi, and Yuxin Yuan. 2018. Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network. IEEE Transactions on Image Processing, Vol. 27, 11 (2018), 5585--5599.
[31]
Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. 2018. Pose-normalized image generation for person re-identification. In Proceedings of the European conference on computer vision. 650--667.
[32]
Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, and Xiangyang Xue. 2020. Long-term cloth-changing person re-identification. In Proceedings of the Asian Conference on Computer Vision.
[33]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proc. Int. Conf. Mach. Learn. PMLR, 8748--8763.
[34]
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline). In Proc. Eur. Conf. Comp. Vis. 501--518.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Proc. Advances in Neural Inf. Process. Syst., Vol. 30, 6000--6010.
[36]
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. 2018. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. In ACM Int. Conf. Multimedia. 274--282.
[37]
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019c. RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. In Proc. IEEE Int. Conf. Comp. Vis. 3622--3631.
[38]
Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, and Yanning Zhang. 2019a. Vehicle re-identification in aerial imagery: Dataset and approach. In Proc. IEEE Int. Conf. Comp. Vis. 460--469.
[39]
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ich Satoh. 2019b. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 618--626.
[40]
Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. 2022. Learning to Prompt for Continual Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 139--149.
[41]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-Infrared Cross-Modality Person Re-identification. In Proc. IEEE Int. Conf. Comp. Vis. 5390--5399.
[42]
Fei Wu, Xiao-Yuan Jing, Zhiyong Wu, Yimu Ji, Xiwei Dong, Xiaokai Luo, Qinghua Huang, and Ruchuan Wang. 2020. Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recogn., Vol. 104 (2020), 107335.
[43]
Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 2019. 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification. IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, 11 (2019), 3347--3359.
[44]
Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In ACM Int. Conf. Multimedia. 3492--3500.
[45]
Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020a. Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. IEEE Transactions on Information Forensics and Security, Vol. 15 (2020), 407--419.
[46]
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020b. Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification. In Proc. Eur. Conf. Comp. Vis. 229--247.
[47]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2022. Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 6 (2022), 2872--2893.
[48]
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep Metric Learning for Person Re-identification. In Proc. Int. Conf. Patt. Recogn. 34--39.
[49]
Hao Yu, Xu Cheng, and Wei Peng. 2023 a. TOPLight: Lightweight Neural Networks with Task-Oriented Pretraining for Visible-Infrared Recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 3541--3550.
[50]
Shengming Yu, Zhaopeng Dou, and Shengjin Wang. 2023 b. Prompting and Tuning: A Two-Stage Unsupervised Domain Adaptive Person Re-identification Method on Vision Transformer Backbone. Tsinghua Science and Technology, Vol. 28, 4 (2023), 799--810.
[51]
Qiang Zhang, Changzhou Lai, Jianan Liu, Nianchang Huang, and Jungong Han. 2022. FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 7339--7348.
[52]
Jiaqi Zhao, Hanzheng Wang, Yong Zhou, Rui Yao, Silin Chen, and Abdulmotaleb El Saddik. 2023. Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Multimedia, Vol. 25 (2023), 3668--3680.
[53]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proc. IEEE Int. Conf. Comp. Vis. 1116--1124.
[54]
Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. 2023. Visual Prompt Multi-Modal Tracking. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 9516--9526.

Cited By

View all
  • (2024)Visible-Infrared Person Re-identification Based on Deep-Shallow Spatial-Frequency Feature FusionProceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition10.1145/3700906.3700958(318-324)Online publication date: 8-Dec-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
May 2024
1379 pages
ISBN:9798400706196
DOI:10.1145/3652583
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-modality person re-identification
  2. visible-infrared person re-identification
  3. visual prompt learning

Qualifiers

  • Research-article

Funding Sources

Conference

ICMR '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)173
  • Downloads (Last 6 weeks)19
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Visible-Infrared Person Re-identification Based on Deep-Shallow Spatial-Frequency Feature FusionProceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition10.1145/3700906.3700958(318-324)Online publication date: 8-Dec-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media