Elsevier

Neurocomputing

Volume 347, 28 June 2019, Pages 109-118
Neurocomputing

Improving person re-identification by multi-task learning

https://doi.org/10.1016/j.neucom.2019.01.027Get rights and content

Abstract

For person re-identification, the core task is to find effective representations of a person image. As Multi-Task Learning can achieve great performance in seeking robust features, we propose a novel Multi-Task Learning Network (MTNet) with four different losses for person re-identification (re-ID). Our MTNet is an end-to-end deep learning framework, which all the parameters and losses can be jointly optimized. In our method we combine two tasks closely corresponding to person re-identification, pedestrian identity task and pedestrian attribute task, who provide complementary information from different perspective by integrating multi-context informations. Attribute focuses on some special aspects of a person, while identity pays more attention to overall contour and appearance. Meanwhile, both classification and verification losses are employed to optimize the distance of samples. Identification losses are used to construct a large class space, while verification losses are applied optimize the space by minimizing the distance between similar images and maximizing the distance between dissimilar images. In the experiments, our MTNet achieves the state-of-the-art results on two typical datasets Market1501 [1] and DukeMTMC-reID [2].

Introduction

Person re-identification has a potential significance in security application. It is usually considered as an image retrieval issue, which matches person from different cameras and ranks the gallery images according to the similarities. Existing methods mainly focus on extracting robust representations [3], [4], [5], [6], [7], [8], [9], [10], [11] and learning matching functions or metrics [5], [12], [13], [14], [15], [16], [17]. With the excellent performance in computer vision tasks, deep learning has also been adopted to Re-ID community [6], [18], [19], [20], [21] and has gained promising results.

This paper aims at learning to extract robust representations and improving the performance of person re-ID in large-scale datasets. Both person identity and attributes are essential information in surveillance videos. In general, the identity feature can be considered as overall feature, which depends on the overall contour and appearance. The attributes can be considered as features that are only related to some aspects of a pedestrian. Identity feature contains overall information, while attribute feature is more focused on details. So identity feature is more effective for re-id, and attribute feature is more effective for attribute recognition. As mentioned in [22], multi-task learning can obtain more robust feature. These two tasks are also complement with each other. As shown in Fig. 1, the identity task and the attribute task could help each other to achieve better results. For example, the classification system fails in the first query example, due to the similar appearance, such as yellow upper clothing and black lower short pant. Nevertheless, with the gender attribute, the boy in yellow clothing is excluded.

To our knowledge, there are three kinds of methods using deep learning for person re-identification. The classification model [1], [6], [19] devotes to distinguishing the different samples. In these methods, images from different categories in the feature space may be very close to each other, which makes it very difficult and challenging to correctly identify the new samples or new identities. So, the verification model [23], [24] is proposed to minimize the distance between similar images, and maximize the distance between dissimilar ones. However, the verification models have weak abilities in expanding feature space. Besides, a number of previous works [18], [25] treat person re-identification as a binary-category classification task or a similarity regression task. In these works, given an input pair of images, the network determine whether the two images represent the same person.

In our paper, MTNet takes advantage of the two tasks of identity and attribute by combing the two methods of classification and verification. In our framework, the verification technology is embedded in the attributes, which makes up the defect of simple attribute identification model in a certain extent. As shown in the Fig. 1, we define a identity label and a set of attribute labels for every pedestrian (the second boxes). Based on these labels, we use four independent branches to train a robust multi-task network (the third boxes), which further achieves the person re-identification task and person attribute recognition task (the forth boxes).

Our main contributions are summarized as follows:

  • We proposed a multi-task learning network (MTNet). It learns an end-to-end CNN embedding for person re-ID and an attribute prediction model simultaneously. As shown in Fig. 2, four different tasks are integrated to MTNet including identity classification, identity verification, attribute classification and attribute verification.

  • We employ the verification loss for attribute task, which is the first time in the field of person re-ID task.The attribute verification task not only assists attribute classificatioin task but also promotes the identity verification task.

  • We achieve the state-of-the-art results on two large-scale person re-ID datasets Market1501 [1] and DukeMTMC-reID [2].

Section snippets

Related work

This section briefly reviews several closely related works, classification based methods, verification based methods and attributes based methods.

Approach

In this section, we introduce our whole network architecture and the definition of the multiple tasks.

Datasets and evaluation metrics

The Market1501 dataset [1] is a large-scale person re-ID dataset, which contains 32,668 gallery images and 3368 query images captured by 6 cameras. Each annotated identity is presented in at least two cameras, so that cross-camera search can be performed. The images are automatically detected by the deformable part model (DPM) instead of hand-drawn bounding boxes, which is closer to the realistic setting. According to the default setting, there are 12,936 cropped images of 751 identities for

Conclusions

In this paper, we propose a novel multi-task learning network for person re-identification by learning multiple complementary information end-to-end. The four different sub-tasks are applied to some extent and mutually benefited from the multi-task learning procedure. The multi-task network includes identity classification loss, identity verification loss, attribute classification loss and attribute verification loss. We introduce the attribute loss to improve the detailed discriminant ability

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grant U1536203 and 61672254 in part by the National Key Research and Development Program of China (2016QY01W0200), in part by the Major Scientific and Technological Project of Hubei Province (2018AAA068).

Hefei Ling Prof.Ling obtained the B.S, M.S, Ph.D. degree from Huazhong University of Science and Technology, China in 1999, 2002, 2005, respectively. He is currently serving as a professor in the School of Computer Science and Technology, HUST. Prof.Ling served as a visiting professor in University College London from 2008 to 2009. He has published more than 100 papers. Now he serves as director of digital media and Intelligent Technology Research Institute.

References (52)

  • MaB. et al.

    Covariance descriptor based on bio-inspired features for person re-identification and face verification

    IVC

    (2014)
  • R. Layne et al.

    Person re-identification by attributes

    Proceedings of the BMVC

    (2014)
  • ZhengL. et al.

    Scalable person re-identification: a benchmark

    Proceedings of the ICCV

    (2015)
  • ZhengZ. et al.

    Unlabeled samples generated by GAN improve the person re-identification baseline in vitro

    Proceedings of the ICCV

    (2017)
  • YangY. et al.

    Salient color names for person re-identification

    Proceedings of the ECCV

    (2014)
  • LiaoS. et al.

    Person re-identification by local maximal occurrence representation and metric learning

    Proceedings of the CVPR

    (2015)
  • SunY. et al.

    Svdnet for pedestrian retrieval

    Proceedings of the ICCV

    (2017)
  • Xiao-Yuan JingM.L. et al.

    Face and palmprint pixel level fusion and Kernel DCV-RBF classifier for small sample biometric recognition

    Pattern Recognit.

    (2007)
  • JingX.-Y. et al.

    A face and palmprint recognition approach based on discriminant DCT feature extraction

    IEEE Trans. Syst. Man Cybern. Part B (Cybern.)

    (2004)
  • ZhuX. et al.

    Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics

    IEEE Trans. Image Process.

    (2018)
  • ZhuX. et al.

    Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix

    IEEE Trans. Inf. Forensics Secur.

    (2018)
  • JingX.-Y. et al.

    Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning

    IEEE Trans. Image Process.

    (2017)
  • ZhengW.S. et al.

    Person re-identification by probabilistic relative distance comparison

    Proceedings of the CVPR

    (2011)
  • M. Köstinger et al.

    Large scale metric learning from equivalence constraints

    Proceedings of the CVPR

    (2012)
  • LiZ. et al.

    Learning locally-adaptive decision functions for person verification

    Proceedings of the CVPR

    (2013)
  • G. Lisanti et al.

    Matching people across camera views using Kernel canonical correlation analysis

    Proceedings of the ICDSC

    (2014)
  • LiaoS. et al.

    Efficient PSD constrained asymmetric metric learning for person re-identification

    Proceedings of the ICCV

    (2015)
  • ShenY. et al.

    Person re-identification with correspondence structure learning

    Proceedings of the ICCV

    (2015)
  • E. Ahmed et al.

    An improved deep learning architecture for person re-identification

    Proceedings of the CVPR

    (2015)
  • XiaoT. et al.

    Learning deep feature representations with domain guided dropout for person re-identification

    Proceedings of the CVPR

    (2016)
  • WangF. et al.

    Joint learning of single-image and cross-image representations for person re-identification

    Proceedings of the CVPR

    (2016)
  • R.R. Varior et al.

    Gated siamese convolutional neural network architecture for human re-identification

    Proceedings of the ECCV

    (2016)
  • S. Ruder, An overview of multi-task learning in deep neural networks, in:...
  • LiuH. et al.

    Deep supervised hashing for fast image retrieval

    Proceedings of the CVPR

    (2016)
  • A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification, in:...
  • LiW. et al.

    Deepreid: deep filter pairing neural network for person re-identification

    Proceedings of the CVPR

    (2014)
  • Cited by (51)

    • Spatial-wise and channel-wise feature uncertainty for occluded person re-identification

      2022, Neurocomputing
      Citation Excerpt :

      Evaluation Metrics. The model is evaluated with standard metrics as in most person Re-ID literatures [18], namely the cumulative matching cure (CMC) and the mean Average Precision (mAP). All the experiments are performed in single query setting (Table 1).

    View all citing articles on Scopus

    Hefei Ling Prof.Ling obtained the B.S, M.S, Ph.D. degree from Huazhong University of Science and Technology, China in 1999, 2002, 2005, respectively. He is currently serving as a professor in the School of Computer Science and Technology, HUST. Prof.Ling served as a visiting professor in University College London from 2008 to 2009. He has published more than 100 papers. Now he serves as director of digital media and Intelligent Technology Research Institute.

    Ziyang Wang received the B.E. degree in Software Engineer from Shangdong University, Weihai, China in 2016. He is currently pursuing the M.S. degree at Huazhong University of Science and Technology, Wuhan, China. His research interest includes computer vision and multimedia data analysis, such as largescale multimedia indexing and retrieval.

    Ping Li is a lecturer in school of Computer science & Technology, Huazhong University of Science and Technology(HUST). He received his Ph.D. degree in Computer Application from HUST in 2009. His research interests include multimedia security, image retrieval and machine learning.

    Yuxuan Shi is currently a Ph.D. student in the School of Computer Science and Technology at Huazhong University of Science and Technology. He received the B.S. degree in electronic information engineering from Wuhan University of Science and Technology, Wuhan, China. He received his M.S. in traffic information engineering and control at Wuhan University of Technology. His research interest includes computer vision and multimedia data analysis, such as largescale multimedia indexing and retrieval.

    Jiazhong Chen received his M.S. and Ph.D. degrees from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1999 and 2003. He is currently an associate professor in the school of computer science and technology, HUST. His current research interests include computer vision, image processing, machine learning, and multimedia communications.

    View full text