skip to main content
10.1145/3503161.3547975acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Keypoint-Guided Modality-Invariant Discriminative Learning for Visible-Infrared Person Re-identification

Published: 10 October 2022 Publication History

Abstract

The visible-infrared person re-identification (VI-ReID) task aims to retrieve images of pedestrians across cameras with different modalities. In this task, the major challenges arise from two aspects: intra-class variations among images of the same identity, and cross-modality discrepancies between visible and infrared images. Existing methods mainly focus on the latter, attempting to alleviate the impact of modality discrepancy, which ignore the former issue of identity variations and achieve limited discrimination. To address both aspects, we propose a Keypoint-guided Modality-invariant Discriminative Learning (KMDL) method, which can simultaneously adapt to intra-ID variations and bridge the cross-modality gap. By introducing human keypoints, our method makes further exploration in the image space, feature space and loss constraints to solve the above issues. Specifically, considering the modality discrepancy in original images, we first design a Hue Jitter Augmentation (HJA) strategy, introducing the hue disturbance to alleviate color dependence in the input stage. To obtain discriminative fine-grained representation for retrieval, we design the Global-Keypoint Graph Module (GKGM) in feature space, which can directly extract keypoint-aligned features and mine relationships within global and keypoint embeddings. Based on these semantic local embeddings, we further propose the Keypoint-Aware Center (KAC) loss that can effectively adjust the feature distribution under the supervision of ID and keypoint to learn discriminative representation for the matching. Extensive experiments on SYSU-MM01 and RegDB datasets demonstrate the effectiveness of our KMDL method.

Supplementary Material

MP4 File (MM22-fp0928.mp4)
Brief introduction of the paper 'Keypoint-Guided Modality-Invariant Discriminative Learning for Visible-Infrared Person Re-identification'

References

[1]
Shaked Brody, Uri Alon, and Eran Yahav. 2022. How Attentive are Graph Attention Networks? (2022).
[2]
Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 1 (2021), 172--186.
[3]
Yehansen Chen, Lin Wan, Zhihang Li, Qianyan Jing, and Zongyuan Sun. 2021. Neural Feature Search for RGB-Infrared Person Re-Identification. In CVPR 2021. 587--597.
[4]
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification. In CVPR 2020. 10254--10263.
[5]
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-Modality Person Re-Identification with Generative Adversarial Training. In IJCAI 2019. 677--683.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR 2009. 248--255.
[7]
Haoshu Fang, Shuqin Xie, Yu-Wing Tai, and Cewu Lu. 2017. RMPE: Regional Multi-person Pose Estimation. In ICCV 2017. IEEE Computer Society, 2353--2362.
[8]
Zhan-Xiang Feng, Jianhuang Lai, and Xiaohua Xie. 2020. Learning Modality-Specific Representations for Visible-Infrared Person Re-Identification. IEEE Transactions on Image Process, Vol. 29 (2020), 579--590.
[9]
Chaoyou Fu, Yibo Hu, Xiang Wu, Hailin Shi, Tao Mei, and Ran He. 2021. CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification. In ICCV 2021. 11803--11812.
[10]
Shang Gao, Jingya Wang, Huchuan Lu, and Zimo Liu. 2020. Pose-Guided Visible Part Matching for Occluded Person ReID. In CVPR 2020. 11741--11749.
[11]
Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, and Congyan Lang. 2021. MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification. In ACM Multimedia 2021. 5257--5265.
[12]
Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation. In ICCV 2021. 16383--16392.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[14]
Mengxi Jia, Yunpeng Zhai, Shijian Lu, Siwei Ma, and Jian Zhang. 2020. A Similarity Inference Metric for RGB-Infrared Cross-Modality Person Re-identification. In IJCAI 2020. 1026--1032.
[15]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR 2015.
[16]
Diangang Li, Xing Wei, Xiaopeng Hong, and Yihong Gong. 2020. Infrared-Visible Cross-Modal Person Re-Identification with an X Modality. In AAAI 2020. 4610--4617.
[17]
Yongguo Ling, Zhun Zhong, Zhiming Luo, Paolo Rota, Shaozi Li, and Nicu Sebe. 2020. Class-Aware Modality Mix and Center-Guided Metric Learning for Visible-Thermal Person Re-Identification. In ACM Multimedia 2020. 889--897.
[18]
Wu Liu, Qian Bao, Yu Sun, and Tao Mei. 2022. Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective. ACM Computing Surveys (CSUR) (2022).
[19]
Yan Lu, Yue Wu, Bin Liu, Tianzhu Zhang, Baopu Li, Qi Chu, and Nenghai Yu. 2020. Cross-Modality Person Re-Identification With Shared-Specific Feature Transfer. In CVPR 2020. 13376--13386.
[20]
Hao Luo, Youzhi Gu, Xingyu Liao, Shenqi Lai, and Wei Jiang. 2019. Bag of Tricks and a Strong Baseline for Deep Person Re-Identification. In CVPR 2019. 1487--1495.
[21]
Jiaxu Miao, Yu Wu, Ping Liu, Yuhang Ding, and Yi Yang. 2019. Pose-Guided Feature Alignment for Occluded Person Re-Identification. In ICCV 2019. 542--551.
[22]
Dat Tien Nguyen, Hyung Gil Hong, Ki-Wan Kim, and Kang Ryoung Park. 2017. Person Recognition System Based on a Combination of Body Images from Visible Light and Thermal Cameras. Sensors, Vol. 17, 3 (2017), 605.
[23]
Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences. In ICCV 2021. 12046--12055.
[24]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kö pf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS 2019. 8024--8035.
[25]
Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, and Michael S. Lew. 2020. Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification. In ACM Multimedia 2020. 2149--2158.
[26]
Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, and Michael J Black. 2022. Putting people in their place: Monocular regression of 3d people in depth. In CVPR. 13243--13252.
[27]
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline). In ECCV 2018, Vol. 11208. 501--518.
[28]
Xudong Tian, Zhizhong Zhang, Shaohui Lin, Yanyun Qu, Yuan Xie, and Lizhuang Ma. 2021. Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification. In CVPR. 1522--1531.
[29]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).
[30]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In ICLR 2019.
[31]
Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li, and Xi Zhou. 2018. Learning Discriminative Features with Multiple Granularities for Person Re-Identification. In ACM Multimedia 2018. 274--282.
[32]
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019b. RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment. In ICCV 2019. 3622--3631.
[33]
Guan'an Wang, Tianzhu Zhang, Yang Yang, Jian Cheng, Jianlong Chang, Xu Liang, and Zeng-Guang Hou. 2020b. Cross-Modality Paired-Images Generation for RGB-Infrared Person Re-Identification. In AAAI 2020. 12144--12151.
[34]
Qi Wang, Xinchen Liu, Wu Liu, An-An Liu, Wenyin Liu, and Tao Mei. 2020a. MetaSearch: Incremental Product Search via Deep Meta-Learning. IEEE Trans. Image Process., Vol. 29 (2020), 7549--7564.
[35]
Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Yung-Yu Chuang, and Shin'ichi Satoh. 2019a. Learning to Reduce Dual-Level Discrepancy for Infrared-Visible Person Re-Identification. In CVPR 2019. 618--626.
[36]
Ziyu Wei, Xi Yang, Nannan Wang, and Xinbo Gao. 2021. Syncretic Modality Collaborative Learning for Visible Infrared Person Re-Identification. In ICCV 2021. 225--234.
[37]
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A Discriminative Feature Learning Approach for Deep Face Recognition. In ECCV 2016, Vol. 9911. 499--515.
[38]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-Infrared Cross-Modality Person Re-identification. In ICCV 2017. 5390--5399.
[39]
Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover Cross-Modality Nuances for Visible-Infrared Person Re-Identification. In CVPR 2021. 4330--4339.
[40]
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification. In ECCV 2020, Vol. 12362. 229--247.
[41]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven C. H. Hoi. 2021. Deep Learning for Person Re-identification: A Survey and Outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1.
[42]
Ziyue Zhang, Shuai Jiang, Congzhentao Huang, Yang Li, and Richard Yi Da Xu. 2021. RGB-IR cross-modality person ReID based on teacher-student GAN model. Pattern Recognit. Lett., Vol. 150 (2021), 155--161.
[43]
Zhiwei Zhao, Bin Liu, Qi Chu, Yan Lu, and Nenghai Yu. 2021. Joint Color-irrelevant Consistency Learning and Identity-aware Modality Adaptation for Visible-infrared Cross Modality Person Re-identification. In AAAI 2021. 3520--3528.
[44]
Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, and Zheng-Jun Zha. 2021. Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification. In ACM Multimedia. 4537--4545.
[45]
Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person Re-identification: Past, Present and Future. CoRR, Vol. abs/1610.02984 (2016).
[46]
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, and Qi Tian. 2017. Person Re-identification in the Wild. In CVPR 2017. 3346--3355.
[47]
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking Person Re-identification with k-Reciprocal Encoding. In CVPR 2017. 3652--3661.
[48]
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random Erasing Data Augmentation. In AAAI 2020. 13001--13008.
[49]
Yuanxin Zhu, Zhao Yang, Li Wang, Sai Zhao, Xiao Hu, and Dapeng Tao. 2020. Hetero-Center loss for cross-modality person Re-identification. Neurocomputing, Vol. 386 (2020), 97--109.io

Cited By

View all
  • (2025)Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identificationInformation Fusion10.1016/j.inffus.2024.102882117(102882)Online publication date: May-2025
  • (2024)Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337725234:8(7683-7698)Online publication date: Aug-2024
  • (2024)Cascaded Cross-modal Alignment for Visible-Infrared Person Re-IdentificationKnowledge-Based Systems10.1016/j.knosys.2024.112585305:COnline publication date: 3-Dec-2024
  • Show More Cited By

Index Terms

  1. Keypoint-Guided Modality-Invariant Discriminative Learning for Visible-Infrared Person Re-identification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modality
    2. data augmentation
    3. graph
    4. keypoint
    5. vi-reid

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)76
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identificationInformation Fusion10.1016/j.inffus.2024.102882117(102882)Online publication date: May-2025
    • (2024)Bridging the Gap: Multi-Level Cross-Modality Joint Alignment for Visible-Infrared Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.337725234:8(7683-7698)Online publication date: Aug-2024
    • (2024)Cascaded Cross-modal Alignment for Visible-Infrared Person Re-IdentificationKnowledge-Based Systems10.1016/j.knosys.2024.112585305:COnline publication date: 3-Dec-2024
    • (2023)Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual InteractionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612479(46-55)Online publication date: 26-Oct-2023
    • (2023)ASTDF-Net: Attention-Based Spatial-Temporal Dual-Stream Fusion Network for EEG-Based Emotion RecognitionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612208(883-892)Online publication date: 26-Oct-2023
    • (2023)Triple-Granularity Contrastive Learning for Deep Multi-View Subspace ClusteringProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611844(2994-3002)Online publication date: 26-Oct-2023
    • (2023)Enhancing Domain-Invariant Parts for Generalized Zero-Shot LearningProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611764(6283-6291)Online publication date: 26-Oct-2023
    • (2023)SSRR: Structural Semantic Representation Reconstruction for Visible-Infrared Person Re-IdentificationIEEE Transactions on Multimedia10.1109/TMM.2023.334785526(6273-6284)Online publication date: 28-Dec-2023
    • (2023)A Generative-Based Image Fusion Strategy for Visible-Infrared Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328730034:1(518-533)Online publication date: 19-Jun-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media