Improving person re-identification by multi-task learning
Introduction
Person re-identification has a potential significance in security application. It is usually considered as an image retrieval issue, which matches person from different cameras and ranks the gallery images according to the similarities. Existing methods mainly focus on extracting robust representations [3], [4], [5], [6], [7], [8], [9], [10], [11] and learning matching functions or metrics [5], [12], [13], [14], [15], [16], [17]. With the excellent performance in computer vision tasks, deep learning has also been adopted to Re-ID community [6], [18], [19], [20], [21] and has gained promising results.
This paper aims at learning to extract robust representations and improving the performance of person re-ID in large-scale datasets. Both person identity and attributes are essential information in surveillance videos. In general, the identity feature can be considered as overall feature, which depends on the overall contour and appearance. The attributes can be considered as features that are only related to some aspects of a pedestrian. Identity feature contains overall information, while attribute feature is more focused on details. So identity feature is more effective for re-id, and attribute feature is more effective for attribute recognition. As mentioned in [22], multi-task learning can obtain more robust feature. These two tasks are also complement with each other. As shown in Fig. 1, the identity task and the attribute task could help each other to achieve better results. For example, the classification system fails in the first query example, due to the similar appearance, such as yellow upper clothing and black lower short pant. Nevertheless, with the gender attribute, the boy in yellow clothing is excluded.
To our knowledge, there are three kinds of methods using deep learning for person re-identification. The classification model [1], [6], [19] devotes to distinguishing the different samples. In these methods, images from different categories in the feature space may be very close to each other, which makes it very difficult and challenging to correctly identify the new samples or new identities. So, the verification model [23], [24] is proposed to minimize the distance between similar images, and maximize the distance between dissimilar ones. However, the verification models have weak abilities in expanding feature space. Besides, a number of previous works [18], [25] treat person re-identification as a binary-category classification task or a similarity regression task. In these works, given an input pair of images, the network determine whether the two images represent the same person.
In our paper, MTNet takes advantage of the two tasks of identity and attribute by combing the two methods of classification and verification. In our framework, the verification technology is embedded in the attributes, which makes up the defect of simple attribute identification model in a certain extent. As shown in the Fig. 1, we define a identity label and a set of attribute labels for every pedestrian (the second boxes). Based on these labels, we use four independent branches to train a robust multi-task network (the third boxes), which further achieves the person re-identification task and person attribute recognition task (the forth boxes).
Our main contributions are summarized as follows:
- •
We proposed a multi-task learning network (MTNet). It learns an end-to-end CNN embedding for person re-ID and an attribute prediction model simultaneously. As shown in Fig. 2, four different tasks are integrated to MTNet including identity classification, identity verification, attribute classification and attribute verification.
- •
We employ the verification loss for attribute task, which is the first time in the field of person re-ID task.The attribute verification task not only assists attribute classificatioin task but also promotes the identity verification task.
- •
We achieve the state-of-the-art results on two large-scale person re-ID datasets Market1501 [1] and DukeMTMC-reID [2].
Section snippets
Related work
This section briefly reviews several closely related works, classification based methods, verification based methods and attributes based methods.
Approach
In this section, we introduce our whole network architecture and the definition of the multiple tasks.
Datasets and evaluation metrics
The Market1501 dataset [1] is a large-scale person re-ID dataset, which contains 32,668 gallery images and 3368 query images captured by 6 cameras. Each annotated identity is presented in at least two cameras, so that cross-camera search can be performed. The images are automatically detected by the deformable part model (DPM) instead of hand-drawn bounding boxes, which is closer to the realistic setting. According to the default setting, there are 12,936 cropped images of 751 identities for
Conclusions
In this paper, we propose a novel multi-task learning network for person re-identification by learning multiple complementary information end-to-end. The four different sub-tasks are applied to some extent and mutually benefited from the multi-task learning procedure. The multi-task network includes identity classification loss, identity verification loss, attribute classification loss and attribute verification loss. We introduce the attribute loss to improve the detailed discriminant ability
Acknowledgments
This work was supported in part by the Natural Science Foundation of China under Grant U1536203 and 61672254 in part by the National Key Research and Development Program of China (2016QY01W0200), in part by the Major Scientific and Technological Project of Hubei Province (2018AAA068).
Hefei Ling Prof.Ling obtained the B.S, M.S, Ph.D. degree from Huazhong University of Science and Technology, China in 1999, 2002, 2005, respectively. He is currently serving as a professor in the School of Computer Science and Technology, HUST. Prof.Ling served as a visiting professor in University College London from 2008 to 2009. He has published more than 100 papers. Now he serves as director of digital media and Intelligent Technology Research Institute.
References (52)
- et al.
Covariance descriptor based on bio-inspired features for person re-identification and face verification
IVC
(2014) - et al.
Person re-identification by attributes
Proceedings of the BMVC
(2014) - et al.
Scalable person re-identification: a benchmark
Proceedings of the ICCV
(2015) - et al.
Unlabeled samples generated by GAN improve the person re-identification baseline in vitro
Proceedings of the ICCV
(2017) - et al.
Salient color names for person re-identification
Proceedings of the ECCV
(2014) - et al.
Person re-identification by local maximal occurrence representation and metric learning
Proceedings of the CVPR
(2015) - et al.
Svdnet for pedestrian retrieval
Proceedings of the ICCV
(2017) - et al.
Face and palmprint pixel level fusion and Kernel DCV-RBF classifier for small sample biometric recognition
Pattern Recognit.
(2007) - et al.
A face and palmprint recognition approach based on discriminant DCT feature extraction
IEEE Trans. Syst. Man Cybern. Part B (Cybern.)
(2004) - et al.
Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics
IEEE Trans. Image Process.
(2018)
Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix
IEEE Trans. Inf. Forensics Secur.
Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning
IEEE Trans. Image Process.
Person re-identification by probabilistic relative distance comparison
Proceedings of the CVPR
Large scale metric learning from equivalence constraints
Proceedings of the CVPR
Learning locally-adaptive decision functions for person verification
Proceedings of the CVPR
Matching people across camera views using Kernel canonical correlation analysis
Proceedings of the ICDSC
Efficient PSD constrained asymmetric metric learning for person re-identification
Proceedings of the ICCV
Person re-identification with correspondence structure learning
Proceedings of the ICCV
An improved deep learning architecture for person re-identification
Proceedings of the CVPR
Learning deep feature representations with domain guided dropout for person re-identification
Proceedings of the CVPR
Joint learning of single-image and cross-image representations for person re-identification
Proceedings of the CVPR
Gated siamese convolutional neural network architecture for human re-identification
Proceedings of the ECCV
Deep supervised hashing for fast image retrieval
Proceedings of the CVPR
Deepreid: deep filter pairing neural network for person re-identification
Proceedings of the CVPR
Cited by (51)
3D human pose and shape estimation via de-occlusion multi-task learning
2023, NeurocomputingKernel-based learning of orthogonal functions
2023, NeurocomputingLanguage and vision based person re-identification for surveillance systems using deep learning with LIP layers
2023, Image and Vision ComputingDeep Convolution Neural Network sharing for the multi-label images classification
2022, Machine Learning with ApplicationsSpatial-wise and channel-wise feature uncertainty for occluded person re-identification
2022, NeurocomputingCitation Excerpt :Evaluation Metrics. The model is evaluated with standard metrics as in most person Re-ID literatures [18], namely the cumulative matching cure (CMC) and the mean Average Precision (mAP). All the experiments are performed in single query setting (Table 1).
Hefei Ling Prof.Ling obtained the B.S, M.S, Ph.D. degree from Huazhong University of Science and Technology, China in 1999, 2002, 2005, respectively. He is currently serving as a professor in the School of Computer Science and Technology, HUST. Prof.Ling served as a visiting professor in University College London from 2008 to 2009. He has published more than 100 papers. Now he serves as director of digital media and Intelligent Technology Research Institute.
Ziyang Wang received the B.E. degree in Software Engineer from Shangdong University, Weihai, China in 2016. He is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. His research interest includes computer vision and multimedia data analysis, such as largescale multimedia indexing and retrieval.
Ping Li is a lecturer in school of Computer science & Technology, Huazhong University of Science and Technology(HUST). He received his Ph.D. degree in Computer Application from HUST in 2009. His research interests include multimedia security, image retrieval and machine learning.
Yuxuan Shi is currently a Ph.D. student in the School of Computer Science and Technology at Huazhong University of Science and Technology. He received the B.S. degree in electronic information engineering from Wuhan University of Science and Technology, Wuhan, China. He received his M.S. in traffic information engineering and control at Wuhan University of Technology. His research interest includes computer vision and multimedia data analysis, such as largescale multimedia indexing and retrieval.
Jiazhong Chen received his M.S. and Ph.D. degrees from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1999 and 2003. He is currently an associate professor in the school of computer science and technology, HUST. His current research interests include computer vision, image processing, machine learning, and multimedia communications.