research-article

MFEN: Multi-scale Feature Expansion Network for Visible-Infrared Person Re-Identification

Authors:

Houjin ChenAuthors Info & Claims

CVDL '24: Proceedings of the International Conference on Computer Vision and Deep Learning

Article No.: 40, Pages 1 - 6

https://doi.org/10.1145/3653781.3653824

Published: 01 June 2024 Publication History

Abstract

Abstract: One of the main challenges of visible-infrared person re-identification (Re-ID) is the huge discrepancy bet-ween the heterogeneous images. To alleviate this problem, many existing researches utilize a dual stream learning framework to learn discriminative features. However, the performance is limited by insufficient training data. Aiming at this problem, a feature level data augmentation network, named MFEN, is proposed in this paper. Through the designed Global-local Feature Augmentation (GFA) module, the network can generate more feature embeddings and combine the global and local features, which is beneficial for learning modality-invariant feature. Besides, to enhance the robustness, a Multi-stages Feature Integration (MFI) block is designed to mitigate information loss and aggregate multi-scale features. In addition, a multilayer perceptron (MLP) layer is added to the end of network for further improving feature representations. Extensive experiments have been conducted on the SYSU-MM01 and RegDB datasets. Experimental results validate the superior performance of our proposed method with existing methods.

Keywords: person re-identification; feature augmentation; visible-infrared person re-identification; multi-scale feature aggregation

References

[1]

A. Wu, W.S. Zheng, H.X. Yu, S. Gong, J Lai., 2017. Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 5380–5389.

[2]

P. DAI, R. JI, H. WANG and, 2018. Cross-modality person re-identification with generative adversarial training. In: Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18.

[3]

D. Li, X. Wei, X. Hong, Y. Gong., 2020. Infrared-Visible Cross-Modal Person Re-Identification with an X Modality. In: AAAI, 2020.

[4]

Y. Zhang, Y. Yan, L. Yang, H. Wang, 2021. Towards a Unified Middle Modality Learning for Visible-Infrared Person Re-Identification. In: ACM MM.

[5]

Ji. Liu, J. Wang, N. Huang, Q. Zhang, J. Han. (2022) Revisiting modality-specific feature compensation for visible-infrared person re-identification. IEEE Transactions on Circuits and Systems for Video Technology.

[6]

M. Ye, J.B. Shen, D. J Crandall, L. Shao, and J. Luo.,2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the ECCV, pages 229–247.

[7]

CHEN C Q, YE M, QI M B, Structure-Aware Positional Transformer for Visible-Infrared Person Re-Identification. IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 31, 2022.

[8]

H. Lu, X. Zou, P. Zhan. (2023). Learning Progressive Modality-shared Transformers for Effective Visible-Infrared. Association for the Advancement of Artificial Intelligence.

[9]

G. Anmol, Qin, J. Chiu, C. Cheng. Conformer: Convolution-augmented Transformer for Speech Recognition. 2020.10.48550/arXiv.2005.08100

[10]

SW Zamir, A Arora, S Khan, M Hayat, FS Khan, MH Yang, L Shao., 2021. Multi-Stage Progressive Image Restoration. In: CVPR.

[11]

X. Wang, R. Girshick, A. Gupta, and K. He., 2018. Non-local neural networks. In: Proceedings of the CVPR, pages 7794–7803, 2018. 4, 7

[12]

T. Chen, S. Kornblith, K. Swersky, M. Norouzi, G. Hinton. (2020) Big Self-Supervised Models are Strong Semi-Supervised Learners. arXiv:2006.10029v2.

[13]

Y. Zhang, H. Wang., 2023. Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification. In: CVPR.

[14]

S. Choi, S. Lee, Y. Kim, T. Kim, and C. Kim.,2020. Hi-cmd: Hierarchical cross-modality disentanglement for visible-infrared person re-identification. In: Proceedings of the CVPR, pages 10257–10266, 2020.

[15]

J. Liu, Y. Sun, g. Zhu, H. Pei, Y. Yang, and W. Li. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In: Proceedings of the CVPR, pages 19366–19375, 2022. 1, 2.

Recommendations

Joint Feature Learning Network for Visible-Infrared Person Re-identification
Pattern Recognition and Computer Vision
Abstract
Visible-infrared person re-identification (VI-ReID) is a significant technology in night-time surveillance applications. Compared to traditional person re-identification that focuses on only visible imaging system, the modality discrepancy between ...
Multi-view feature fusion for person re-identification
Abstract
Person re-identification (ReID) suffers from camera view variants. Existing works, which typically learn a feature for each image, share a limitation that the learned features are single-view: each feature only contains information in one camera ...
Highlights
- The complementary-view features are defined to mitigate view bias.
- Multi-view Message Passing (MVMP) scheme generates multi-view features in the test stage.
- Multi-view Feature Fusion Network (MFFN) increases sensitivity to ...
Graphical abstract

Display Omitted
Person re-identification based on multi-scale feature learning
Abstract
Extracting discriminative pedestrian features is an effective method in person re-identification. Most person re-identification works focus on extracting abstract features from the high-layer of the network, but ignore the middle-layer ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CVDL '24: Proceedings of the International Conference on Computer Vision and Deep Learning

January 2024

506 pages

ISBN:9798400718199

DOI:10.1145/3653804

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Beijing Natural Science Foundation
National Science Foundation of China

Conference

CVDL 2024

CVDL 2024: The International Conference on Computer Vision and Deep Learning

January 19 - 21, 2024

Changsha, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
48
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten