From pedestrian to group retrieval via siamese network and correlation
Introduction
In recent years, person re-identification (re-id) has attracted considerable attention in the field of computer vision and pattern recognition, which refers to recognition of a specific single pedestrian captured by a network of non-overlapping surveillance views. However, the current literature is based on the assumption on the situation in which the target pedestrians are captured separately by the surveillance system, without other anonymous pedestrians. Existing re-id studies focused on the extraction of discriminative features from individual appearance, whose performance may decline obviously in congest scenarios in which occlusions and identity confusions exist.
About 70% people in reality are likely to move in groups according to the study in [1]. Such social phenomena mean that people in most realistic scenarios tend to behave like a group instead of an individual. For example, as shown in Fig. 1, persons may move together with some friends or strangers in the road, which may lead to occlusions of their bodies. In this case, it can be inferred that the mismatch possibilities of the person identity could be decreased if the target person had been supplemented useful correlations in the surrounding group, which are unrelated to the distribution of the surveillance [2], [3], [4], [5]. Therefore, to make further improvement, it is a necessary and significative issue to find an efficient correlation between a person and a corresponding group.
A group consists of several specific pedestrians who maintain high motion collectiveness during a continuous period of time, it is an elementary unit to constitute the crowd [6]. This study aims at a novel task of pedestrian group retrieval, which refers to re-identify a target group of persons who have been observed over a network of other surveillance cameras. That is to say, for a given query group image in a scene, the task is to match the images with the same group identity in other scenes from the gallery correctly. Group retrieval can be applied in many regions, such as anomaly detection, multiple object tracking [7], crowd behavior understanding [6], and suspicious activity analysis. Specifically, researches on group retrieval can provide useful cues for monitoring the moving criminal group by surveillance. On the other hand, surveillance technology has made tremendous progress and been equipped in widespread scenarios including supermarkets, roads, train stations and airports, which is of great help to solve the issue of group retrieval [8].
The group retrieval problem involves many difficult challenges: There is no overlapping view among different cameras since they are quite distant from each other, which brings about some difficult challenges such as uncertain variations of viewpoint and diverse illumination changes [9], [10]. In addition, group retrieval requires identification of multiple pedestrians with diversities in pose and interaction of intra-group members. Considering that the group consists the crowd as its primary part, a large changing crowd can be decomposed into some unique groups with fixed members. Thus, this paper assumes that members in a specific group are stable and remain unique in various cameras during the period of the group retrieval task.
There are fundamental differences between the group retrieval and the person re-id, group retrieval re-identifies a group of several correlated pedestrians with high collectiveness as a whole, whereas person re-id [11] only re-identifies a specific pedestrian individually. Moreover, in daily life, people are prone to move together like a collective group, there is much more interaction among intra-group members than person re-id, the interactive motions have diverse types such as chatting or exchanging position, which contains useful cues for the retrieval. Thus, the internal correlations of the group are failed to be dealt with person re-id, so traditional person re-id approaches cannot directly solve the group retrieval task. As shown in Fig. 2, unlike person re-id, which relies greatly on appearance cues, group retrieval pays more attention to the correlation among its members and relieves the interference caused by the dependence, which makes up for the shortage of person re-id due to its dependence on the appearance cues of each individual.
However, some challenging bottlenecks of group retrieval and association of re-id with the group constrains still exist. On one hand, because group retrieval lacks a public benchmark for training and fair evaluation, few works have researched the group retrieval issue. On the other hand, the traditional person re-id benchmarks fail to evaluate the relationship between group retrieval and re-id because it does not use the association among groups, so it is necessary to design a comprehensive re-id benchmark with group correlations and to explore how to improve the performance of re-id with group information.
To address these issues, this paper makes four contributions:
- •
Two novel datasets are proposed to build a comprehensive group retrieval benchmark. One is a large and challenging group retrieval dataset named “SYSU-Group”. The other is a group-associated re-id dataset named Group-reID, which is related to SYSU-Group with the information of group retrieval. To our knowledge, it is the first dataset to associate group attributes with single person re-id.
- •
The paper proposes a Siamese Verification-Identification-based Group Retrieval (SVIGR) method to extract discriminative pedestrian features, then realizes the pedestrian-to-group matching by the group distance vector (GDV) generated from the minimum distance matching principle.
- •
An efficient group-associated method named Group Retrieval Correlation (GRC) is proposed to improve re-id by using the underlying group information. Moreover, two strategies are proposed to enhance the robustness and generalization of GRC, which promotes the performance of re-id by a large margin.
- •
Two schemes are proposed to annotate the two proposed datasets: the group identity (Group-ID) and the person identity (Person-ID). They are used to train SVIGR models and extract naive re-id features, respectively, then realize the goals of group retrieval and group-guided person re-id.
This paper fusions our two related shorter conference works [12], [13] to a unified framework and expands it to a more comprehensive work. [12] only uses Group-ID scheme to realize group retrieval and not address the group-aided person re-id problem. This manuscript not only addresses the two issues, but also provides two different strategies named G2Q and Q2G to enhance the robustness of GRC, which [13] has not issued. Moreover, [12], [13] only evaluate on the proposed SYSU-Group dataset, this paper uses recent state-of-the-art Road Group [14] and i-LIDS dataset [4] to validate the generalized ability of the proposed method furtherly. More technical details, theoretic analysis, and experimental evaluations on various benchmarks are provided in this paper.
Section snippets
Related works
In this section, the related works are summarized on various aspects of group retrieval.
The proposed SYSU-group group retrieval dataset
In this section, the proposed SYSU-Group group retrieval dataset is introduced, which is begun with the description of the new dataset, then its ground truth and evaluation protocol.
Overall framework
The proposed SVIGR method is divided into four steps: First, an advancing pedestrian detector is used to extract all individual persons from groups. Second, a convolutional Siamese network is adopted to extract the features of the individual persons, and generate their mutual distances by calculating the cosine distance. Third, for each group, the mutual distances that belong to the group are merged to generate GDV. Finally, the GDVs are concatenated into a group distance matrix to obtain the
Framework of person re-identification in groups
In this section, the details of our group-aided person re-id approach are discussed, which assists re-id by using the proposed group retrieval correlation (GRC). The pipeline of the proposed method is shown in Fig. 5. First, the naive features of single-person re-id are extracted by traditional feature extraction methods, including both hand-crafted methods and deep learning methods.
Second, these naive features are combined to adapt the group association problem. Notably, the member-group
Experimental settings
The proposed SVIGR method is evaluated on three different datasets: our proposed SYSU-Group dataset, the i-LIDS dataset [4], and the Road Group dataset [14]. For purposes of training discriminative models, our Siamese network is implemented with four backbone CNN models pretrained from ImageNet [42]: CaffeNet [43], VGG16 [44], GoogleNet [45], and ResNet50 [41]. Their batch sizes are 128, 32, 64, and 16, respectively. After the process of YOLOv3, all obtained person images are resized to 256
Conclusion
In many public security applications such as monitoring group criminal activity, group retrieval plays an important role to provide useful cues by identifying target group across multi-cameras in the surveillance system. To address this issue, this paper first contributes a comprehensive benchmark including a novel group retrieval dataset named “SYSU-Group” and a group-guided re-id dataset named “Group-reID”. Both the two datasets consider realistic kinds of challenges such as viewpoint
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
CRediT authorship contribution statement
Ling Mei: Conceptualization, Methodology, Data curation, Writing - original draft. Jianhuang Lai: Project administration. Zhanxiang Feng: Visualization, Validation, Writing - review & editing. Xiaohua Xie: Formal analysis, Investigation.
Acknowledgment
This work was supported by National Key Research and Development Program of China (2016YFB1001003), the NSFC (61902444, 61876104), and the International Program Fund for Young Talent Scientific Research People, Sun Yat-Sen University.
Ling Mei received his M.S. degree in pattern recognition and intelligent recognition from Sun Yat-sen University in 2016. He is currently pursuing the Ph.D. degree in information and communication engineering from Sun Yat-sen University, Guangdong, China. His research interests are in computer vision, image processing, crowd motion analysis, person re-identification, and optical flow. He has authored papers in IEEE TCSVT and ICPR, and has won the 2nd Prize of the first and second National
References (53)
- et al.
Covariance descriptor based on bio-inspired features for person re-identification and face verification
IVC
(2014) - et al.
Alignedreid++: dynamically matching local information for person re-identification
Pattern Recogn.
(2019) The walking behaviour of pedestrian social groups and its impact on crowd dynamics
PloS One
(2010)- A. Bialkowski, P. Lucey, X. Wei, S. Sridharan, Person re-identification using group information, in: DICTA, 2013, pp....
- S. M. Assari, H. Idrees, M. Shah, Human re-identification in crowd videos using personal, social and environmental...
- et al.
Associating groups of people
BMVC
(2009) - et al.
Group re-identification with multigrained matching and integration
IEEE Trans. Cybern.
(2019) - et al.
Measuring crowd collectiveness via global motion correlation
ICCV Workshops
(2019) - E. Ristani, C. Tomasi, Features for multi-target multi-camera tracking and re-identification, in: CVPR, 2018, pp....
- L. Zheng, Y. Yang, A. G. Hauptmann, Person re-identification: Past, present and future, arXiv preprint...
Illumination-invariance optical flow estimation using weighted regularization transform
TCSVT
Geodesic-based probability propagation for efficient optical flow
Electron. Lett.
Siamese network for pedestrian group retrieval: a benchmark
Person re-identification using group constrain
Integral channel features
BMVC
Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)
Ann. Stat.
Person re-identification by iterative re-weighted sparse ranking
TPAMI
Cited by (8)
Meta-seg: A survey of meta-learning for image segmentation
2022, Pattern RecognitionUncertainty Modeling for Group Re-Identification
2024, International Journal of Computer VisionA summary on group re-identification
2023, Journal of Image and GraphicsUncertainty Modeling with Second-Order Transformer for Group Re-identification
2022, Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
Ling Mei received his M.S. degree in pattern recognition and intelligent recognition from Sun Yat-sen University in 2016. He is currently pursuing the Ph.D. degree in information and communication engineering from Sun Yat-sen University, Guangdong, China. His research interests are in computer vision, image processing, crowd motion analysis, person re-identification, and optical flow. He has authored papers in IEEE TCSVT and ICPR, and has won the 2nd Prize of the first and second National Graduate Contest on Smart-City and Creative Design in 2014 and 2015.
Jianhuang Lai received the Ph.D. degree in mathematics in 1999 from Sun Yat-sen University, China. He joined Sun Yat-sen University in 1989 as an assistant professor, where he is currently a Professor of the School of Data and Computer Science. His current research interests are in the areas of computer vision, digital image processing, pattern recognition, multimedia communication and multiple target tracking. He has published over 100 scientific papers in the international journals and conferences on image processing and pattern recognition, e.g., IEEE TPAMI, IEEE TNN, IEEE TKDE, IEEE TIP, IEEE TSMC (Part B), IEEE TCSVT, Pattern Recognition, ICCV, CVPR, and ICDM. He serves as a deputy director of the Image and Graphics Association of China. He is a Fellow of Image and Graphics Society of China. He is also a senior member of the IEEE.
Zhanxiang Feng received the Ph.D. degree in information and communication engineering from Sun Yat-Sen University, China, in 2018. He joined Sun Yat-sen University in 2019 as a post-doctoral fellow. His research interests include person re-identification, face recognition, face hallucination, image super-resolution, and visual surveillance. He has authored more than 10 papers including IEEE TIP, Neurocomputing and ICPR.
Xiaohua Xie received his B.S. degree in mathematics and applied mathematics (2005) from Shantou University, a M.S. degree in information and computing science (2007), and a Ph.D. degree in applied mathematics (2010) from Sun Yat-sen University in China. He is currently an Associate Professor at Sun Yat-sen University (SYSU). Prior to joining SYSU, he was an Associate Professor at Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences. His current research fields cover image processing, computer vision, pattern recognition, and computer graphics. He has published more than a dozen papers in the prestigious international journals and conferences. He is recognized as Overseas High-Caliber Personnel (Level B) in Shenzhen, China.