research-article

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

Authors:

Bingliang Jiao,

Peng WangAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 579 - 588

https://doi.org/10.1145/3652583.3658109

Published: 07 June 2024 Publication History

Abstract

The Visible-Infrared Person Re-identification (VI ReID) aims to match visible and infrared images of the same pedestrians across non-overlapped camera views. These two input modalities contain both invariant information, such as shape, and modality-specific details, such as color. An ideal model should utilize valuable information from both modalities during training for enhanced representational capability. However, the gap caused by modality-specific information poses substantial challenges for the VI ReID model to handle distinct modality inputs simultaneously. To address this, we introduce the Modality-aware and Instance-aware Visual Prompts (MIP) network in our work, designed to effectively utilize both invariant and specific information for identification. Specifically, our MIP model is built on the transformer architecture. In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification. Besides, we also employ each pedestrian feature to construct a group of instance-specific prompts. These customized prompts are responsible for guiding our model to adapt to each pedestrian instance dynamically, thereby capturing identity-level discriminative clues for identification. Through extensive experiments on SYSU-MM01 and RegDB datasets, the effectiveness of both our designed modules is evaluated. Additionally, our proposed MIP performs better than most state-of-the-art methods.

References

[1]

Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, and Phillip Isola. 2022. Exploring Visual Prompts for Adapting Large-Scale Models. arXiv preprint arXiv:2203.17274 (2022).

[2]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[3]

Cuiqun Chen, Mang Ye, Meibin Qi, Jingjing Wu, Jianguo Jiang, and Chia-Wen Lin. 2022b. Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Image Process., Vol. 31 (2022), 2352--2364.

[4]

Shoufa Chen, Chongjian GE, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, and Ping Luo. 2022a. AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition. In Proc. Advances in Neural Inf. Process. Syst., Vol. 35. 16664--16678.

[5]

Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 1320--1329.

[6]

Yehansen Chen, Lin Wan, Zhihang Li, Qianyan Jing, and Zongyuan Sun. 2021. Neural Feature Search for RGB-Infrared Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 587--597.

[7]

Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-Modality Person Re-Identification with Generative Adversarial Training. In Proc. Int. Joint Conf. Artificial Intell. 677--683.

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 248--255.

[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[10]

Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. In Proc. IEEE Int. Conf. Comp. Vis. 11803--11812.

[11]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proc. Advances in Neural Inf. Process. Syst., Vol. 27.

[12]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint arXiv:1703.07737 (2017).

[13]

Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, and Zheng-Jun Zha. 2022. Modality-Adaptive Mixup and Invariant Decomposition for RGB-Infrared Person Re-identification. In Proc. Conf. AAAI. 1034--1042.

[14]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. Int. Conf. Mach. Learn. 448--456.

[15]

Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In Proc. Eur. Conf. Comp. Vis. Springer, 709--727.

Digital Library

[16]

Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Ruiqi Wu, Shizhou Zhang, Peng Wang, and Yanning Zhang. 2022a. Generalizable Person Re-Identification via Viewpoint Alignment and Fusion. arXiv preprint arXiv:2212.02398 (2022).

[17]

Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Lu Yang, Shizhou Zhang, Peng Wang, and Yanning Zhang. 2022b. Dynamically Transformed Instance Normalization Network for Generalizable Person Re-Identification. In Proc. Eur. Conf. Comp. Vis.

Digital Library

[18]

Bingliang Jiao, Lingqiao Liu, Liying Gao, Ruiqi Wu, Guosheng Lin, Peng Wang, and Yanning Zhang. 2023. Toward Re-Identifying Any Animal. In Proc. Advances in Neural Inf. Process. Syst.

[19]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. In Proc. Conf. Empirical Methods in Natural Language Processing. 3045--3059.

[20]

Wenkang Li, Ke Qi, Wenbin Chen, and Yicong Zhou. 2021. Bridging the Distribution Gap of Visible-Infrared Person Re-identification with Modality Batch Normalization. In Proc. Int. Conf. on Artificial Intelligence and Computer Applications. 23--28.

[21]

Tengfei Liang, Yi Jin, Wu Liu, and Yidong Li. 2023. Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification. IEEE Trans. Multimedia (2023), 1--13.

Digital Library

[22]

Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, and Nicu Sebe. 2020. Class-Aware Modality Mix and Center-Guided Metric Learning for Visible-Thermal Person Re-Identification. In ACM Int. Conf. Multimedia. 889--897.

[23]

Jianan Liu, Jialiang Wang, Nianchang Huang, Qiang Zhang, and Jungong Han. 2022. Revisiting Modality-Specific Feature Compensation for Visible-Infrared Person Re-Identification. IEEE Trans. Circuits Syst. Video Technol., Vol. 32, 10 (2022), 7226--7240.

[24]

Jialu Liu and Meng Yang. 2023. Prompt-Based Transformer for Generalizable Person Re-identification with Image Masking. In Chinese Conference on Biometric Recognition. 259--268.

Digital Library

[25]

Xinchen Liu, Wu Liu, Huadong Ma, and Huiyuan Fu. 2016. Large-scale vehicle re-identification in urban surveillance videos. In Proc. IEEE Int. Conf. Multimedia Expo. 1--6.

[26]

Hu Lu, Xuezhang Zou, and Pingping Zhang. 2023. Learning Progressive Modality-Shared Transformers for Effective Visible-Infrared Person Re-Identification. In Proc. Conf. AAAI. 1835--1843.

Digital Library

[27]

Yuning Lu, Jianzhuang Liu, Yonggang Zhang, Yajing Liu, and Xinmei Tian. 2022. Prompt Distribution Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 5196--5205.

[28]

Hao Luo, Wei Jiang, Youzhi Gu, Fuxu Liu, Xingyu Liao, Shenqi Lai, and Jianyang Gu. 2020. A Strong Baseline and Batch Normalization Neck for Deep Person Re-Identification. IEEE Trans. Multimedia, Vol. 22, 10 (2020), 2597--2609.

[29]

Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors, Vol. 17 (2017), 605.

[30]

Yuxin Peng, Jinwei Qi, and Yuxin Yuan. 2018. Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network. IEEE Transactions on Image Processing, Vol. 27, 11 (2018), 5585--5599.

Digital Library

[31]

Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, and Xiangyang Xue. 2018. Pose-normalized image generation for person re-identification. In Proceedings of the European conference on computer vision. 650--667.

Digital Library

[32]

Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, and Xiangyang Xue. 2020. Long-term cloth-changing person re-identification. In Proceedings of the Asian Conference on Computer Vision.

[33]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proc. Int. Conf. Mach. Learn. PMLR, 8748--8763.

[34]

Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline). In Proc. Eur. Conf. Comp. Vis. 501--518.

Digital Library

[35]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Proc. Advances in Neural Inf. Process. Syst., Vol. 30, 6000--6010.

[36]

Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. 2018. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. In ACM Int. Conf. Multimedia. 274--282.

[37]

Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019c. RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. In Proc. IEEE Int. Conf. Comp. Vis. 3622--3631.

[38]

Peng Wang, Bingliang Jiao, Lu Yang, Yifei Yang, Shizhou Zhang, Wei Wei, and Yanning Zhang. 2019a. Vehicle re-identification in aerial imagery: Dataset and approach. In Proc. IEEE Int. Conf. Comp. Vis. 460--469.

[39]

Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ich Satoh. 2019b. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 618--626.

[40]

Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, and Tomas Pfister. 2022. Learning to Prompt for Continual Learning. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 139--149.

[41]

Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-Infrared Cross-Modality Person Re-identification. In Proc. IEEE Int. Conf. Comp. Vis. 5390--5399.

[42]

Fei Wu, Xiao-Yuan Jing, Zhiyong Wu, Yimu Ji, Xiwei Dong, Xiaokai Luo, Qinghua Huang, and Ruchuan Wang. 2020. Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recogn., Vol. 104 (2020), 107335.

[43]

Lin Wu, Yang Wang, Ling Shao, and Meng Wang. 2019. 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification. IEEE Transactions on Neural Networks and Learning Systems, Vol. 30, 11 (2019), 3347--3359.

[44]

Jinyu Yang, Zhe Li, Feng Zheng, Ales Leonardis, and Jingkuan Song. 2022. Prompting for Multi-Modal Tracking. In ACM Int. Conf. Multimedia. 3492--3500.

[45]

Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C. Yuen. 2020a. Bi-Directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. IEEE Transactions on Information Forensics and Security, Vol. 15 (2020), 407--419.

Digital Library

[46]

Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020b. Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-Identification. In Proc. Eur. Conf. Comp. Vis. 229--247.

Digital Library

[47]

Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2022. Deep Learning for Person Re-Identification: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 6 (2022), 2872--2893.

[48]

Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep Metric Learning for Person Re-identification. In Proc. Int. Conf. Patt. Recogn. 34--39.

[49]

Hao Yu, Xu Cheng, and Wei Peng. 2023 a. TOPLight: Lightweight Neural Networks with Task-Oriented Pretraining for Visible-Infrared Recognition. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 3541--3550.

[50]

Shengming Yu, Zhaopeng Dou, and Shengjin Wang. 2023 b. Prompting and Tuning: A Two-Stage Unsupervised Domain Adaptive Person Re-identification Method on Vision Transformer Backbone. Tsinghua Science and Technology, Vol. 28, 4 (2023), 799--810.

[51]

Qiang Zhang, Changzhou Lai, Jianan Liu, Nianchang Huang, and Jungong Han. 2022. FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-Identification. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 7339--7348.

[52]

Jiaqi Zhao, Hanzheng Wang, Yong Zhou, Rui Yao, Silin Chen, and Abdulmotaleb El Saddik. 2023. Spatial-Channel Enhanced Transformer for Visible-Infrared Person Re-Identification. IEEE Trans. Multimedia, Vol. 25 (2023), 3668--3680.

Digital Library

[53]

Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proc. IEEE Int. Conf. Comp. Vis. 1116--1124.

[54]

Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, and Huchuan Lu. 2023. Visual Prompt Multi-Modal Tracking. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn. 9516--9526.

Cited By

Chen YChe JYang Y(2024)Visible-Infrared Person Re-identification Based on Deep-Shallow Spatial-Frequency Feature FusionProceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition10.1145/3700906.3700958(318-324)Online publication date: 8-Dec-2024
https://doi.org/10.1145/3700906.3700958

Index Terms

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
        Object recognition
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Exploring modality enhancement and compensation spaces for visible-infrared person re-identification
Abstract
Visible-infrared person re-identification (VI-ReID) is a challenging task in computer vision due to the substantial modality gaps between visible and infrared images. The currently existing approaches can improve performance by addressing cross-...
Highlights
- A VI-ReID method reduces modality discrepancy by constructing two modality spaces.
- Employ modality enhancement space to extract features with rich modality information.
- Utilize modality compensation space to compensate for missing ...
Visible–infrared person re-identification via patch-mixed cross-modality learning
Abstract
Visible–infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate ...
Highlights
- Our method treats the RGB and IR images in the same way and alleviate the modality imbalance problem in VI-ReID.
- The part-alignment loss constrains the consistency of part and global prediction distributions.
- The patch-mixed ...
Hybrid Modality Metric Learning for Visible-Infrared Person Re-Identification
Visible-infrared person re-identification (Re-ID) has received increasing research attention for its great practical value in night-time surveillance scenarios. Due to the large variations in person pose, viewpoint, and occlusion in the same modality, as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Natural Science Basic Research Program of Shaanxi Province
Shaanxi Provincial Key R&D Program

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
173
Total Downloads

Downloads (Last 12 months)173
Downloads (Last 6 weeks)19

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen YChe JYang Y(2024)Visible-Infrared Person Re-identification Based on Deep-Shallow Spatial-Frequency Feature FusionProceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition10.1145/3700906.3700958(318-324)Online publication date: 8-Dec-2024
https://doi.org/10.1145/3700906.3700958

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten