research-article

MAENet: Boosting Feature Representation for Cross-Modal Person Re-Identification with Pairwise Supervision

Authors:
Yongbiao Chen

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Sheng Zhang

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Zhengwei Qi

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

ICMR '20: Proceedings of the 2020 International Conference on Multimedia RetrievalJune 2020Pages 442–449https://doi.org/10.1145/3372278.3390699

Published:08 June 2020Publication History

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

Pages 442–449

ABSTRACT

Person re-identification aims at successfully retrieving the images of a specific person in the gallery dataset given a probe image. Among all the existing research areas related to person re-identification, visible to thermal person re-identification (VT-REID) has gained proliferating momentum. VT-REID is deemed to be a rather challenging task owing to the large cross-modality gap [25], cross-modality variation and intra-modality variation. Existing techniques generally tackle this problem by embedding cross-modality data with convolutional neural networks into shared feature space to bridge the cross-modality discrepancy, and subsequently, devise hinge losses on similarity learning to alleviate the variation. However, feature extraction methods based simply on convolutional neural networks may fail to capture the distinctive and modality-invariant features, resulting in noises for further re-identification techniques. In this work, we present a novel modality and appearance invariant embedding learning framework equipped with maximum likelihood learning to perform cross-modal person re-identification. Extensive and comprehensive experiments are conducted to test the effectiveness of our framework. Results demonstrated that the proposed framework yields state-of-the-art Re-ID accuracy on RegDB and SYSU-MM01 datasets.

References

Ejaz Ahmed, Michael Jones, and Tim K Marks. 2015. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3908--3916.Google ScholarCross Ref
Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep cauchy hashing for hamming space retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1237.Google ScholarCross Ref
Yue Cao, Mingsheng Long, Jianmin Wang, and Han Zhu. 2016. Correlation autoencoder hashing for supervised cross-modal search. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 197--204.Google ScholarDigital Library
Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-Modality Person Re-Identification with Generative Adversarial Training.. In IJCAI. 677--683.Google Scholar
Zhanxiang Feng, Jianhuang Lai, and Xiaohua Xie. 2018. Learning view-specific deep networks for person re-identification. IEEE Transactions on Image Processing, Vol. 27, 7 (2018), 3472--3483.Google ScholarCross Ref
Yi Hao, Nannan Wang, Xinbo Gao, Jie Li, and Xiaoyu Wang. 2019 a. Dual-alignment Feature Embedding for Cross-modality Person Re-identification. In Proceedings of the 27th ACM International Conference on Multimedia. 57--65.Google ScholarDigital Library
Yi Hao, Nannan Wang, Jie Li, and Xinbo Gao. 2019 b. HSME: hypersphere manifold embedding for visible thermal person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8385--8392.Google ScholarDigital Library
Albert Haque, Alexandre Alahi, and Li Fei-Fei. 2016. Recurrent attention models for depth-based person identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1238.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
De-An Huang and Yu-Chiang Frank Wang. 2013. Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In Proceedings of the IEEE international conference on computer vision. 2496--2503.Google ScholarDigital Library
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2288--2295.Google ScholarCross Ref
Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, and Xiaogang Wang. 2017a. Identity-aware textual-visual matching with latent co-attention. In Proceedings of the IEEE International Conference on Computer Vision. 1890--1899.Google ScholarCross Ref
Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017b. Person search with natural language description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1970--1979.Google ScholarCross Ref
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014a. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 152--159.Google ScholarDigital Library
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014b. Deepreid: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 152--159.Google ScholarDigital Library
Xiang Li, Wei-Shi Zheng, Xiaojuan Wang, Tao Xiang, and Shaogang Gong. 2015. Multi-scale learning for low-resolution person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3765--3773.Google ScholarDigital Library
Zhen Li, Shiyu Chang, Feng Liang, Thomas S Huang, Liangliang Cao, and John R Smith. 2013. Learning locally-adaptive decision functions for person verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3610--3617.Google ScholarDigital Library
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015a. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2197--2206.Google ScholarCross Ref
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015b. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2197--2206.Google ScholarCross Ref
Shengcai Liao and Stan Z Li. 2015. Efficient psd constrained asymmetric metric learning for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3685--3693.Google ScholarDigital Library
Liang Lin, Guangrun Wang, Wangmeng Zuo, Xiangchu Feng, and Lei Zhang. 2016. Cross-domain visual matching via generalized similarity measure and feature learning. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 6 (2016), 1089--1102.Google Scholar
Matteo Munaro, Alberto Basso, Andrea Fossati, Luc Van Gool, and Emanuele Menegatti. 2014. 3D reconstruction of freely moving persons for re-identification with a depth sensor. In 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4512--4519.Google ScholarCross Ref
Dat Nguyen, Hyung Hong, Ki Kim, and Kang Park. 2017. Person recognition system based on a combination of body images from visible light and thermal cameras. Sensors, Vol. 17, 3 (2017), 605.Google ScholarCross Ref
M Saquib Sarfraz and Rainer Stiefelhagen. 2017. Deep perceptual mapping for cross-modal face recognition. International Journal of Computer Vision, Vol. 122, 3 (2017), 426--438.Google ScholarDigital Library
Arnold WM Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis & Machine Intelligence 12 (2000), 1349--1380.Google ScholarDigital Library
Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. Svdnet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3800--3808.Google ScholarCross Ref
Yifan Sun, Liang Zheng, Yi Yang, Qi Tian, and Shengjin Wang. 2018. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In Proceedings of the European Conference on Computer Vision (ECCV). 480--496.Google ScholarDigital Library
Rahul Rama Varior, Bing Shuai, Jiwen Lu, Dong Xu, and Gang Wang. 2016. A siamese long short-term memory architecture for human re-identification. In European conference on computer vision. Springer, 135--153.Google ScholarCross Ref
Shenlong Wang, Lei Zhang, Yan Liang, and Quan Pan. 2012. Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2216--2223.Google ScholarCross Ref
Zheng Wang, Ruimin Hu, Yi Yu, Junjun Jiang, Chao Liang, and Jinqiao Wang. 2016. Scale-Adaptive Low-Resolution Person Re-Identification via Learning a Discriminating Surface.. In IJCAI. 2669--2675.Google Scholar
Ancong Wu, Wei-Shi Zheng, and Jian-Huang Lai. 2017a. Robust depth-based person re-identification. IEEE Transactions on Image Processing, Vol. 26, 6 (2017), 2588--2603.Google ScholarDigital Library
Ancong Wu, Wei-Shi Zheng, Hong-Xing Yu, Shaogang Gong, and Jianhuang Lai. 2017b. Rgb-infrared cross-modality person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 5380--5389.Google ScholarCross Ref
Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016b. Personnet: Person re-identification with deep convolutional neural networks. arXiv preprint arXiv:1601.07255 (2016).Google Scholar
Lin Wu, Yang Wang, Xue Li, and Junbin Gao. 2018. What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recognition, Vol. 76 (2018), 727--738.Google ScholarDigital Library
Shangxuan Wu, Ying-Cong Chen, Xiang Li, An-Cong Wu, Jin-Jie You, and Wei-Shi Zheng. 2016a. An enhanced deep feature representation for person re-identification. In 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, 1--8.Google Scholar
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1249--1258.Google ScholarCross Ref
Mang Ye, Xiangyuan Lan, and Qingming Leng. 2019 a. Modality-aware Collaborative Learning for Visible Thermal Person Re-Identification. In Proceedings of the 27th ACM International Conference on Multimedia. 347--355.Google ScholarDigital Library
Mang Ye, Xiangyuan Lan, Jiawei Li, and Pong C Yuen. 2018a. Hierarchical discriminative learning for visible thermal person re-identification. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Mang Ye, Xiangyuan Lan, Zheng Wang, and Pong C Yuen. 2019 b. Bi-directional Center-Constrained Top-Ranking for Visible Thermal Person Re-Identification. IEEE Transactions on Information Forensics and Security (2019).Google Scholar
Mang Ye, Chao Liang, Zheng Wang, Qingming Leng, Jun Chen, and Jun Liu. 2015. Specific person retrieval via incomplete text description. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 547--550.Google ScholarDigital Library
Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C Yuen. 2018b. Visible Thermal Person Re-Identification via Dual-Constrained Top-Ranking. In IJCAI. 1092--1099.Google Scholar
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z Li. 2014. Deep metric learning for person re-identification. In 2014 22nd International Conference on Pattern Recognition. IEEE, 34--39.Google ScholarDigital Library
Zhou Yin, Wei-Shi Zheng, Ancong Wu, Hong-Xing Yu, Hai Wan, Xiaowei Guo, Feiyue Huang, and Jianhuang Lai. 2017. Adversarial attribute-image person re-identification. arXiv preprint arXiv:1712.01493 (2017).Google Scholar
Hong-Xing Yu, Wei-Shi Zheng, Ancong Wu, Xiaowei Guo, Shaogang Gong, and Jian-Huang Lai. 2019. Unsupervised Person Re-identification by Soft Multilabel Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2148--2157.Google ScholarCross Ref
Li Zhang, Tao Xiang, and Shaogang Gong. 2016b. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1239--1248.Google ScholarCross Ref
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016a. Sample-specific svm learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1278--1287.Google ScholarCross Ref
Liang Zheng, Yi Yang, and Qi Tian. 2017. SIFT meets CNN: A decade survey of instance retrieval. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 5 (2017), 1224--1244.Google Scholar

Index Terms

MAENet: Boosting Feature Representation for Cross-Modal Person Re-Identification with Pairwise Supervision
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Multi-view feature fusion for person re-identification
Abstract
Person re-identification (ReID) suffers from camera view variants. Existing works, which typically learn a feature for each image, share a limitation that the learned features are single-view: each feature only contains information in ...
Highlights
- The complementary-view features are defined to mitigate view bias.
- Multi-view ...
Graphical abstract

Display Omitted
Read More
Part-based Feature Extraction for Person Re-identification
ICMLC '18: Proceedings of the 2018 10th International Conference on Machine Learning and Computing

In this paper, we propose a new part-based CNN feature extraction method for end-to-end person re-identification. In our method, the input images are first divided into two different non-overlapping parts, and then two different CNN models are trained ...
Read More
Unbiased feature enhancement framework for cross-modality person re-identification
Abstract
Cross-modality person re-identification aims at matching the RGB images of a specific person in variable appearances with his/her images in another modality like infrared modality, sketch modality, etc. It is challenging due to domain gap and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval
June 2020
605 pages
ISBN:9781450370875
DOI:10.1145/3372278
General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Björn Þór Jónsson
IT University of Copenhagen, Denmark
,
Noriko Kando
National Institute of Informatics, Tokyo
,
Program Chairs:
Klaus Schoeffmann
Klagenfurt University, Austria
,
Phoebe Chen
La Trobe University, Australia
,
Noel E. O'Connor
Dublin City University, Ireland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 June 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cross-modal retrieval
deep learning
information retrieval
person re-identification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate254of830submissions,31%
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 222
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MAENet: Boosting Feature Representation for Cross-Modal Person Re-Identification with Pairwise Supervision

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-view feature fusion for person re-identification

Part-based Feature Extraction for Person Re-identification

Unbiased feature enhancement framework for cross-modality person re-identification