Scale-invariant batch-adaptive residual learning for person re-identification☆
Introduction
The problem of person re-identification (re-ID) deals with appearance based matching of pedestrians from two camera views, termed as probe and gallery [1], [2]. Re-identification problem becomes difficult to solve as the same person can look very different in these views due to variations in pose, illumination and viewpoint. Furthermore, these persons can also undergo scale variations thereby adding more distortions in their appearance attributes. Variations in scale may occur due to factors like inaccurate localization of a person within a detected bounding box and variations in the physical distance of a person from different cameras in the 3D world. This scale variation greatly increases the complexity of the re-ID task.
Several existing methods have addressed the re-ID problem with two main focus, namely, (a) robust feature descriptor generation [1], [3] and (b) better metric learning [4], [5], [6]. However, none of these works have explicitly addressed the issue of scale variation. Typically, the existing approaches train a deep Siamese neural network for robust feature extraction. Residual networks (ResNets) [7] have emerged as a popular deep network as they yield comparable performance to other deep models with much less model parameters. Another crucial factor, which plays a key role in achieving accurate person re-ID, is metric learning. Triplet loss function has become popular for better metric learning [8]. More recently, in [6], the authors use batch hard triplet loss to mine most relevant (hardest) triplets within a batch. However, even such model is found to be susceptible to outliers.
In this paper, we address the problem of scale-invariant person re-ID using scale-invariant residual networks and a new loss function for deep metric learning. Our main contributions are the following: (a) we propose two scale invariant residual architectures and establish that these networks have better gradient activation over conventional ResNets; and (b) we introduce a batch adaptive triplet loss function with better triplet mining capability (over existing batch-hard triplet loss [6]) within a Siamese configuration.
Section snippets
Related work
Earlier most of works in re-ID [1], [4] are based on hand-crafted system. However, they have failed to yield good results in complex scenarios. Inspired by the excellent performance of the Convolutional Neural Networks (CNNs) in image classification tasks, many recent works in re-ID have explored deep learning as a part of their solution. For example, see the works reported in [2], [9], [10], [11]. However, accuracy of such models are limited by the unavailability of large training data.
Proposed architecture
Feature descriptors (filter kernels) in a CNN are able to detect relevant features irrespective of their spatial locations. However, this behavior (at a given scale) cannot be automatically guaranteed while dealing with more than one scale. Learning feature detectors (filter kernels) that can respond to similar patterns at multiple scales is more likely to improve the recognition task in current deep architectures. Such a neural network can be termed as a scale-invariant convolution neural
Proposed deep metric learning
Weinberger and Saul [5] proposed metric learning for k-nearest neighbor classification via “Large Margin Nearest neighbor (LMNN)” loss. A problem with LMNN is it cannot properly handle fixed targeted neighbors, FaceNet [8] introduced “triplet loss” [8], which was more suitable for deep metric learning. If ga, gp and gn be an arbitrary anchor, positive and negative (triplet) person set and Da,p and Da,n represents the similarity measure (square euclidean distance) between embedding of pairs (ga,
Triplet siamese configuration
Deeper neural networks generally perform better than the shallower ones but such networks are difficult to train due to over-fitting resulting from unavailability of sufficient data. A better option is to train a network under consideration over the existing pre-trained models. In this work, we propose two SI residual networks, one deeper and one shallower. The deeper SI network is developed from the pre-trained ResNet-50 [7] architecture while the shallower network is built from stacking
Experimental results
In this section, we present a brief discussion of the datasets, followed by the evaluation protocol and training details. Then, we show detailed comparisons with several state-of-the-art methods and also include two ablation studies.
Conclusion
In this paper, we proposed two scale-invariant residual networks for robust person re-ID tasks. We also introduced a new triplet loss function for better metric learning. Superior performance over the current state-of-the-art approaches on the benchmark Market-1501 and CUHK03 datasets indicate the effectiveness of our formulation. In future, we plan to extend the proposed framework for re-ID problems in open settings.
Declaration of competing interest
We hereby declare that we do not have any conflict of interest for this manuscript.
References (25)
- et al.
Multilevel triplet deep learning model for person re-identification
Pattern Recognit. Lett.
(2019) - et al.
Learning a discriminative null space for person re-identification
CVPR
(2016) - et al.
Gated siamese convolutional neural network architecture for human re-identification
ECCV
(2016) - et al.
Large scale metric learning from equivalence constraints
CVPR
(2012) - et al.
Distance metric learning for large margin nearest neighbor classification
J. Mach. Learn. Res.
(2009) - A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person re-identification, (2017)....
- et al.
Deep residual learning for image recognition
CVPR
(2016) - et al.
FaceNet: A unified embedding for face recognition and clustering
CVPR
(2015) - et al.
Deep mutual learning
CVPR
(2018) - et al.
Harmonious attention network for person re-identification
CVPR
(2018)
Resource aware person re-identification across multiple resolutions
CVPR
Person re-identification by deep joint learning of multi-loss classification
Proceedings of IJCAI
Cited by (11)
Counterfactual attention alignment for visible-infrared cross-modality person re-identification
2023, Pattern Recognition LettersAVPL: Augmented visual perception learning for person Re-identification and beyond
2022, Pattern RecognitionCitation Excerpt :And the object function required that the feature distance of positive sample pairs must be smaller than that of the negative sample pairs. Sikdar et al. [27] propose a batch adaptive triplet loss with better triplet mining capability within a Siamese configuration. In this work, the positives are weighted based on their hardness with respect to the anchor (i.e. similarity).
HMMN: Online metric learning for human re-identification via hard sample mining memory network
2021, Engineering Applications of Artificial IntelligenceCitation Excerpt :In addition, Li et al. (2018), Lawen et al. (2020), Zhou et al. (2021) and Li et al. (2021) have been published describing the design of small networks for person re-identification problem to reduce heavy computational resource consumption, which is more suitable for edge computing. It is worth noting the jobs (Sikdar et al., 2020; Sikdar and Chowdhury, 2020), which introduce an open-set person re-identification problem and propose batch-adaptive triplet mining technique for person re-identification. Traditional or close-set re-ID systems are not equipped to handle such cases and raise several false alarms as a result.
Appearance feature enhancement for person re-identification
2021, Expert Systems with ApplicationsCitation Excerpt :E.g., Lv et al. (Lv, Li, Nai, Chen, & Yuan, 2020) proposed an expanded neighborhoods distance (END) to re-rank the re-ID results to address the problem of low intra-class similarity and high inter-class similarity. Besides, some researchers (Sikdar & Chowdhury, 2020) carefully designed the scale-invariant residual network to extract scale-invariant deep features. Also some researchers deploy the CNN architecture into unsupervised Re-ID tasks.
Deep learning algorithms for person re-identification: sate-of-the-art and research challenges
2024, Multimedia Tools and ApplicationsUnsupervised learning of local features for person re-identification with loss function
2023, International Journal of Autonomous and Adaptive Communications Systems
- ☆
Handle by Associate Editor S. Sarkar.