skip to main content
10.1145/3581783.3612479acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction

Published: 27 October 2023 Publication History

Abstract

Video-based visible-infrared person re-identification (VVI-ReID) aims to retrieve video sequences of the same pedestrian from different modalities. The key of VVI-ReID is to learn discriminative sequence-level representations that are invariant to both intra- and inter-modal discrepancies. However, most works only focus on the elimination of modality-gap while ignore the distractors within the modality. Moreover, existing sequence-level representation learning approaches are limited to a single video, failing to mine the correlations among multiple videos of the same pedestrian. In this paper, we propose a Style Augmentation, Attack and Defense network with Graph-based dual interaction (SAADG) to guarantee the semantic consistency against both intra-modal discrepancies and inter-modal gap. Specifically, we first generate diverse styles for video frames by random style variation in image spaces. Followed by the style attack and defense, the intra- and inter-modal discrepancies are modeled as different types of style disturbance (attack), and our model achieves to keep the id-related content invariant under such attack. Besides, a graph-based dual interaction module is further introduced to fully explore the cross-view and cross-modal correlations among various videos of the same identity, which are then transferred to the sequence-level representations. Extensive experiments on the public SYSU-MM01 and HITSZ-VCM datasets show that our approach achieves the remarkable performance compared with state-of-the-arts. The code is available at https://github.com/ChuhaoZhou99/SAADG_VVIReID.

Supplemental Material

MP4 File
Presentation video of the paper "Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction".

References

[1]
Abhishek Aich, Meng Zheng, Srikrishna Karanam, Terrence Chen, Amit K Roy-Chowdhury, and Ziyan Wu. 2021. Spatio-temporal representation factorization for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'21). 152--162.
[2]
Di Chen, Andreas Doering, Shanshan Zhang, Jian Yang, Juergen Gall, and Bernt Schiele. 2022. Keypoint message passing for video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'22), Vol. 36. 239--247.
[3]
Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, and Changick Kim. 2020. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20). 10257--10266.
[4]
Neng Dong, Liyan Zhang, Shuanglin Yan, Hao Tang, and Jinhui Tang. 2023. Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification. arXiv preprint arXiv:2307.07187 (2023).
[5]
Chanho Eom, Geon Lee, Junghyup Lee, and Bumsub Ham. 2021. Video-based person re-identification with spatial and temporal memory networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'21). 12036--12045.
[6]
Xing Fan, Hao Luo, Chi Zhang, and Wei Jiang. 2020. Cross-spectrum dual-subspace pairing for RGB-infrared cross-modality person re-identification. arXiv preprint arXiv:2003.00213 (2020).
[7]
Jiawei Feng, Ancong Wu, and Wei-Shi Zheng. 2023. Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification. arXiv preprint arXiv:2304.04205 (2023).
[8]
Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, and Congyan Lang. 2021. MSO: Multi-feature space joint optimization network for rgb-infrared person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia (MM'21). 5257--5265.
[9]
Zan Gao, Hongwei Wei, Weili Guan, Weizhi Nie, Meng Liu, and Meng Wang. 2022. Multigranular visual-semantic embedding for cloth-changing person re-identification. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22). 3703--3711.
[10]
Xinqian Gu, Hong Chang, Bingpeng Ma, Hongkai Zhang, and Xilin Chen. 2020. Appearance-preserving 3d convolution for video-based person re-identification. In Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV'20). Springer, 228--243.
[11]
Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-modality person re-identification via modality confusion and center aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'21). 16403--16412.
[12]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'17). 1501--1510.
[13]
Xin Jin, Cuiling Lan, Wenjun Zeng, Zhibo Chen, and Li Zhang. 2020. Style normalization and restitution for generalizable person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20). 3143--3152.
[14]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[15]
Huafeng Li, Minghui Liu, Zhanxuan Hu, Feiping Nie, and Zhengtao Yu. 2023 a. Intermediary-guided Bidirectional Spatial-Temporal Aggregation Network for Video-based Visible-Infrared Person Re-Identification. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) (2023).
[16]
Jianing Li, Shiliang Zhang, and Tiejun Huang. 2019. Multi-scale 3d convolution network for video based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'19), Vol. 33. 8618--8625.
[17]
Yulin Li, Tianzhu Zhang, Xiang Liu, Qi Tian, Yongdong Zhang, and Feng Wu. 2022. Visible-Infrared Person Re-Identification With Modality-Specific Memory Network. IEEE Transactions on Image Processing (TIP), Vol. 31 (2022), 7165--7178.
[18]
Zechao Li, Hao Tang, Zhimao Peng, Guo-Jun Qi, and Jinhui Tang. 2023 b. Knowledge-guided semantic transfer network for few-shot image recognition. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) (2023).
[19]
Tengfei Liang, Yi Jin, Wu Liu, Songhe Feng, Tao Wang, and Yidong Li. 2022. Keypoint-Guided Modality-Invariant Discriminative Learning for Visible-Infrared Person Re-identification. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22). 3965--3973.
[20]
Xinyu Lin, Jinxing Li, Zeyu Ma, Huafeng Li, Shuang Li, Kaixiong Xu, Guangming Lu, and David Zhang. 2022. Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22). 20973--20982.
[21]
Xuehu Liu, Pingping Zhang, Chenyang Yu, Huchuan Lu, and Xiaoyun Yang. 2021. Watching you: Global-guided reciprocal learning for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'21). 13334--13343.
[22]
Yiheng Liu, Zhenxun Yuan, Wengang Zhou, and Houqiang Li. 2019. Spatial and temporal mutual promotion for video-based person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI'19), Vol. 33. 8786--8793.
[23]
Zhongxing Ma, Yifan Zhao, and Jia Li. 2021. Pose-guided inter-and intra-part relational transformer for occluded person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia (MM'21). 1487--1496.
[24]
Dat Tien Nguyen, Hyung Gil Hong, Ki Wan Kim, and Kang Ryoung Park. 2017. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, Vol. 17, 3 (2017), 605.
[25]
Jiaqi Ning, Fei Li, Rujie Liu, Shun Takeuchi, and Genta Suzuki. 2022. Temporal Extension Topology Learning for Video-based Person Re-Identification. In Proceedings of the Asian Conference on Computer Vision (ACCV'22). 207--219.
[26]
Hyunjong Park, Sanghoon Lee, Junghyup Lee, and Bumsub Ham. 2021. Learning by aligning: Visible-infrared person re-identification using cross-modal correspondences. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'21). 12046--12055.
[27]
Zhihao Qian, Yutian Lin, and Bo Du. 2023. Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning. arXiv preprint arXiv:2302.08212 (2023).
[28]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'17). 618--626.
[29]
Tongzhen Si, Fazhi He, Penglei Li, and Xiaoxin Gao. 2023. Tri-modality consistency optimization with heterogeneous augmented images for visible-infrared person re-identification. Neurocomputing, Vol. 523 (2023), 170--181.
[30]
Hanzhe Sun, Jun Liu, Zhizhong Zhang, Chengjie Wang, Yanyun Qu, Yuan Xie, and Lizhuang Ma. 2022. Not All Pixels Are Matched: Dense Contrastive Learning for Cross-Modality Person Re-Identification. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22). 5333--5341.
[31]
Hao Tang, Zechao Li, Zhimao Peng, and Jinhui Tang. 2020. Blockmix: meta regularization and self-calibrated inference for metric-based meta-learning. In Proceedings of the 28th ACM International Conference on Multimedia (MM'20). 610--618.
[32]
Hao Tang, Chengcheng Yuan, Zechao Li, and Jinhui Tang. 2022. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition (PR), Vol. 130 (2022), 108792.
[33]
Xudong Tian, Zhizhong Zhang, Shaohui Lin, Yanyun Qu, Yuan Xie, and Lizhuang Ma. 2021. Farewell to mutual information: Variational distillation for cross-modal person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'21). 1522--1531.
[34]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research (JMLR), Vol. 9, 11 (2008).
[35]
Guan'an Wang, Tianzhu Zhang, Jian Cheng, Si Liu, Yang Yang, and Zengguang Hou. 2019. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'19). 3623--3632.
[36]
Yang Wang, Jinjia Peng, Huibing Wang, and Meng Wang. 2022. Progressive learning with multi-scale attention network for cross-domain vehicle re-identification. Science China Information Sciences (SCIS), Vol. 65, 6 (2022), 160103.
[37]
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017. RGB-infrared cross-modality person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'17). 5380--5389.
[38]
Qiong Wu, Pingyang Dai, Jie Chen, Chia-Wen Lin, Yongjian Wu, Feiyue Huang, Bineng Zhong, and Rongrong Ji. 2021. Discover cross-modality nuances for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'21). 4330--4339.
[39]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'17). 4733--4742.
[40]
Shuanglin Yan, Hao Tang, Liyan Zhang, and Jinhui Tang. 2022. Image-specific information suppression and implicit local alignment for text-based person search. arXiv preprint arXiv:2208.14365 (2022).
[41]
Jinrui Yang, Wei-Shi Zheng, Qize Yang, Ying-Cong Chen, and Qi Tian. 2020. Spatial-temporal graph convolutional network for video-based person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20). 3289--3299.
[42]
Mouxing Yang, Zhenyu Huang, Peng Hu, Taihao Li, Jiancheng Lv, and Xi Peng. 2022. Learning with twin noisy labels for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22). 14308--14317.
[43]
Mang Ye, Weijian Ruan, Bo Du, and Mike Zheng Shou. 2021a. Channel augmented joint learning for visible-infrared recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV'21). 13567--13576.
[44]
Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV'20). Springer, 229--247.
[45]
Mang Ye, Jianbing Shen, Gaojie Lin, Tao Xiang, Ling Shao, and Steven CH Hoi. 2021b. Deep learning for person re-identification: A survey and outlook. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 44, 6 (2021), 2872--2893.
[46]
Zican Zha, Hao Tang, Yunlian Sun, and Jinhui Tang. 2023. Boosting few-shot fine-grained recognition with background suppression and foreground alignment. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) (2023).
[47]
Qiang Zhang, Changzhou Lai, Jianan Liu, Nianchang Huang, and Jungong Han. 2022. Fmcnet: Feature-level modality compensation for visible-infrared person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'22). 7349--7358.
[48]
Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Xin Jin, and Zhibo Chen. 2020. Relation-aware global attention for person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'20). 3186--3195.
[49]
Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, and Zheng-Jun Zha. 2021. Pose-guided feature learning with knowledge distillation for occluded person re-identification. In Proceedings of the 29th ACM International Conference on Multimedia (MM'21). 4537--4545.
[50]
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, and Qi Tian. 2017. Person re-identification in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'17). 1367--1376.

Cited By

View all
  • (2025)Progressive Cross-Modal Association Learning for Unsupervised Visible-Infrared Person Re-IdentificationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352735620(1290-1304)Online publication date: 2025
  • (2025)Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identificationInformation Fusion10.1016/j.inffus.2024.102882117(102882)Online publication date: May-2025
  • (2025)Efficient feature difference-based infrared and visible image fusion for low-light environmentsThe Visual Computer10.1007/s00371-025-03841-9Online publication date: 19-Feb-2025
  • Show More Cited By

Index Terms

  1. Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modality
    2. data augmentation
    3. graph
    4. vi-reid

    Qualifiers

    • Research-article

    Funding Sources

    • Shenzhen Colleges and Universities Stable Support Program
    • Shenzhen Fundamental Research Fund
    • Yunnan Fundamental Research Projects
    • Shenzhen Science and Technology Program
    • National Natural Science Foundation of China
    • Shenzhen Key Technical Project
    • Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)278
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Progressive Cross-Modal Association Learning for Unsupervised Visible-Infrared Person Re-IdentificationIEEE Transactions on Information Forensics and Security10.1109/TIFS.2025.352735620(1290-1304)Online publication date: 2025
    • (2025)Hierarchical disturbance and Group Inference for video-based visible-infrared person re-identificationInformation Fusion10.1016/j.inffus.2024.102882117(102882)Online publication date: May-2025
    • (2025)Efficient feature difference-based infrared and visible image fusion for low-light environmentsThe Visual Computer10.1007/s00371-025-03841-9Online publication date: 19-Feb-2025
    • (2024)A Three-Stage Framework for Video-Based Visible-Infrared Person Re-IdentificationIEEE Signal Processing Letters10.1109/LSP.2024.339467631(1254-1258)Online publication date: 2024
    • (2024)Sequential attention layer-wise fusion network for multi-view classificationInternational Journal of Machine Learning and Cybernetics10.1007/s13042-024-02260-x15:12(5549-5561)Online publication date: 1-Jul-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media