Elsevier

Neurocomputing

Volume 412, 28 October 2020, Pages 447-460
Neurocomputing

From pedestrian to group retrieval via siamese network and correlation

https://doi.org/10.1016/j.neucom.2020.06.055Get rights and content

Abstract

In many public security applications such as anomaly detection, it is important to re-identify a group of pedestrians by other surveillance cameras, which ascribes to the group retrieval problem. Most previous studies focus on single-person re-identification (re-id) and ignore the correlations among group members, and they lack a large and comprehensive group retrieval benchmark to associate these two tasks. To address this issue, this paper focuses on solving the group retrieval problem and uses it to improve re-id. First, the paper build a comprehensive benchmark for both group retrieval and the group-aided re-id task by proposing a novel pedestrian group retrieval dataset named “SYSU-Group” and a corresponding group-associated re-id dataset named “Group-reID”, which introduces realistic challenges such as variations of pose, viewpoint, illumination, and intra-group layout. The paper then proposes the Siamese Verification-Identification-based Group Retrieval (SVIGR) method, which combines verification and identification modules in a Siamese network to extract robust person features and follows the principle of minimum distance matching to realize group retrieval. Finally, a group-guided re-id method named group retrieval correlation (GRC) is proposed to improve re-id with additional group information. Experimental results on three various group retrieval benchmarks demonstrate the superiority and effectiveness of our method.

Introduction

In recent years, person re-identification (re-id) has attracted considerable attention in the field of computer vision and pattern recognition, which refers to recognition of a specific single pedestrian captured by a network of non-overlapping surveillance views. However, the current literature is based on the assumption on the situation in which the target pedestrians are captured separately by the surveillance system, without other anonymous pedestrians. Existing re-id studies focused on the extraction of discriminative features from individual appearance, whose performance may decline obviously in congest scenarios in which occlusions and identity confusions exist.

About 70% people in reality are likely to move in groups according to the study in [1]. Such social phenomena mean that people in most realistic scenarios tend to behave like a group instead of an individual. For example, as shown in Fig. 1, persons may move together with some friends or strangers in the road, which may lead to occlusions of their bodies. In this case, it can be inferred that the mismatch possibilities of the person identity could be decreased if the target person had been supplemented useful correlations in the surrounding group, which are unrelated to the distribution of the surveillance [2], [3], [4], [5]. Therefore, to make further improvement, it is a necessary and significative issue to find an efficient correlation between a person and a corresponding group.

A group consists of several specific pedestrians who maintain high motion collectiveness during a continuous period of time, it is an elementary unit to constitute the crowd [6]. This study aims at a novel task of pedestrian group retrieval, which refers to re-identify a target group of persons who have been observed over a network of other surveillance cameras. That is to say, for a given query group image in a scene, the task is to match the images with the same group identity in other scenes from the gallery correctly. Group retrieval can be applied in many regions, such as anomaly detection, multiple object tracking [7], crowd behavior understanding [6], and suspicious activity analysis. Specifically, researches on group retrieval can provide useful cues for monitoring the moving criminal group by surveillance. On the other hand, surveillance technology has made tremendous progress and been equipped in widespread scenarios including supermarkets, roads, train stations and airports, which is of great help to solve the issue of group retrieval [8].

The group retrieval problem involves many difficult challenges: There is no overlapping view among different cameras since they are quite distant from each other, which brings about some difficult challenges such as uncertain variations of viewpoint and diverse illumination changes [9], [10]. In addition, group retrieval requires identification of multiple pedestrians with diversities in pose and interaction of intra-group members. Considering that the group consists the crowd as its primary part, a large changing crowd can be decomposed into some unique groups with fixed members. Thus, this paper assumes that members in a specific group are stable and remain unique in various cameras during the period of the group retrieval task.

There are fundamental differences between the group retrieval and the person re-id, group retrieval re-identifies a group of several correlated pedestrians with high collectiveness as a whole, whereas person re-id [11] only re-identifies a specific pedestrian individually. Moreover, in daily life, people are prone to move together like a collective group, there is much more interaction among intra-group members than person re-id, the interactive motions have diverse types such as chatting or exchanging position, which contains useful cues for the retrieval. Thus, the internal correlations of the group are failed to be dealt with person re-id, so traditional person re-id approaches cannot directly solve the group retrieval task. As shown in Fig. 2, unlike person re-id, which relies greatly on appearance cues, group retrieval pays more attention to the correlation among its members and relieves the interference caused by the dependence, which makes up for the shortage of person re-id due to its dependence on the appearance cues of each individual.

However, some challenging bottlenecks of group retrieval and association of re-id with the group constrains still exist. On one hand, because group retrieval lacks a public benchmark for training and fair evaluation, few works have researched the group retrieval issue. On the other hand, the traditional person re-id benchmarks fail to evaluate the relationship between group retrieval and re-id because it does not use the association among groups, so it is necessary to design a comprehensive re-id benchmark with group correlations and to explore how to improve the performance of re-id with group information.

To address these issues, this paper makes four contributions:

  • Two novel datasets are proposed to build a comprehensive group retrieval benchmark. One is a large and challenging group retrieval dataset named “SYSU-Group”. The other is a group-associated re-id dataset named Group-reID, which is related to SYSU-Group with the information of group retrieval. To our knowledge, it is the first dataset to associate group attributes with single person re-id.

  • The paper proposes a Siamese Verification-Identification-based Group Retrieval (SVIGR) method to extract discriminative pedestrian features, then realizes the pedestrian-to-group matching by the group distance vector (GDV) generated from the minimum distance matching principle.

  • An efficient group-associated method named Group Retrieval Correlation (GRC) is proposed to improve re-id by using the underlying group information. Moreover, two strategies are proposed to enhance the robustness and generalization of GRC, which promotes the performance of re-id by a large margin.

  • Two schemes are proposed to annotate the two proposed datasets: the group identity (Group-ID) and the person identity (Person-ID). They are used to train SVIGR models and extract naive re-id features, respectively, then realize the goals of group retrieval and group-guided person re-id.

This paper fusions our two related shorter conference works [12], [13] to a unified framework and expands it to a more comprehensive work. [12] only uses Group-ID scheme to realize group retrieval and not address the group-aided person re-id problem. This manuscript not only addresses the two issues, but also provides two different strategies named G2Q and Q2G to enhance the robustness of GRC, which [13] has not issued. Moreover, [12], [13] only evaluate on the proposed SYSU-Group dataset, this paper uses recent state-of-the-art Road Group [14] and i-LIDS dataset [4] to validate the generalized ability of the proposed method furtherly. More technical details, theoretic analysis, and experimental evaluations on various benchmarks are provided in this paper.

Section snippets

Related works

In this section, the related works are summarized on various aspects of group retrieval.

The proposed SYSU-group group retrieval dataset

In this section, the proposed SYSU-Group group retrieval dataset is introduced, which is begun with the description of the new dataset, then its ground truth and evaluation protocol.

Overall framework

The proposed SVIGR method is divided into four steps: First, an advancing pedestrian detector is used to extract all individual persons from groups. Second, a convolutional Siamese network is adopted to extract the features of the individual persons, and generate their mutual distances by calculating the cosine distance. Third, for each group, the mutual distances that belong to the group are merged to generate GDV. Finally, the GDVs are concatenated into a group distance matrix to obtain the

Framework of person re-identification in groups

In this section, the details of our group-aided person re-id approach are discussed, which assists re-id by using the proposed group retrieval correlation (GRC). The pipeline of the proposed method is shown in Fig. 5. First, the naive features of single-person re-id are extracted by traditional feature extraction methods, including both hand-crafted methods and deep learning methods.

Second, these naive features are combined to adapt the group association problem. Notably, the member-group

Experimental settings

The proposed SVIGR method is evaluated on three different datasets: our proposed SYSU-Group dataset, the i-LIDS dataset [4], and the Road Group dataset [14]. For purposes of training discriminative models, our Siamese network is implemented with four backbone CNN models pretrained from ImageNet [42]: CaffeNet [43], VGG16 [44], GoogleNet [45], and ResNet50 [41]. Their batch sizes are 128, 32, 64, and 16, respectively. After the process of YOLOv3, all obtained person images are resized to 256 ×

Conclusion

In many public security applications such as monitoring group criminal activity, group retrieval plays an important role to provide useful cues by identifying target group across multi-cameras in the surveillance system. To address this issue, this paper first contributes a comprehensive benchmark including a novel group retrieval dataset named “SYSU-Group” and a group-guided re-id dataset named “Group-reID”. Both the two datasets consider realistic kinds of challenges such as viewpoint

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Ling Mei: Conceptualization, Methodology, Data curation, Writing - original draft. Jianhuang Lai: Project administration. Zhanxiang Feng: Visualization, Validation, Writing - review & editing. Xiaohua Xie: Formal analysis, Investigation.

Acknowledgment

This work was supported by National Key Research and Development Program of China (2016YFB1001003), the NSFC (61902444, 61876104), and the International Program Fund for Young Talent Scientific Research People, Sun Yat-Sen University.

Ling Mei received his M.S. degree in pattern recognition and intelligent recognition from Sun Yat-sen University in 2016. He is currently pursuing the Ph.D. degree in information and communication engineering from Sun Yat-sen University, Guangdong, China. His research interests are in computer vision, image processing, crowd motion analysis, person re-identification, and optical flow. He has authored papers in IEEE TCSVT and ICPR, and has won the 2nd Prize of the first and second National

References (53)

  • B. Ma et al.

    Covariance descriptor based on bio-inspired features for person re-identification and face verification

    IVC

    (2014)
  • H. Luo et al.

    Alignedreid++: dynamically matching local information for person re-identification

    Pattern Recogn.

    (2019)
  • M. Moussaïd

    The walking behaviour of pedestrian social groups and its impact on crowd dynamics

    PloS One

    (2010)
  • A. Bialkowski, P. Lucey, X. Wei, S. Sridharan, Person re-identification using group information, in: DICTA, 2013, pp....
  • S. M. Assari, H. Idrees, M. Shah, Human re-identification in crowd videos using personal, social and environmental...
  • W. Zheng et al.

    Associating groups of people

    BMVC

    (2009)
  • W. Lin et al.

    Group re-identification with multigrained matching and integration

    IEEE Trans. Cybern.

    (2019)
  • L. Mei et al.

    Measuring crowd collectiveness via global motion correlation

    ICCV Workshops

    (2019)
  • E. Ristani, C. Tomasi, Features for multi-target multi-camera tracking and re-identification, in: CVPR, 2018, pp....
  • L. Zheng, Y. Yang, A. G. Hauptmann, Person re-identification: Past, present and future, arXiv preprint...
  • L. Mei et al.

    Illumination-invariance optical flow estimation using weighted regularization transform

    TCSVT

    (2019)
  • L. Mei et al.

    Geodesic-based probability propagation for efficient optical flow

    Electron. Lett.

    (2018)
  • N. Gheissari, T. Sebastian, R. Hartley, Person re-identification using spatiotemporal appearance, in: CVPR, vol. 2,...
  • L. Mei et al.

    Siamese network for pedestrian group retrieval: a benchmark

  • L. Mei et al.

    Person re-identification using group constrain

  • H. Xiao, W. Lin, B. Sheng, K. Lu, J. Yan, J. Wang, E. Ding, Y. Zhang, H. Xiong, Group re-identification: leveraging and...
  • N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: CVPR, vol. 1, IEEE Computer Society,...
  • P. Dollár et al.

    Integral channel features

    BMVC

    (2009)
  • J. Friedman et al.

    Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)

    Ann. Stat.

    (2000)
  • B. Leibe, E. Seemann, B. Schiele, Pedestrian detection in crowded scenes, in: CVPR, vol. 1, IEEE, 2005, pp....
  • R. Girshick, Fast r-cnn, in: ICCV, 2015, pp....
  • W. Liu, D. Anguelov, D. Erhan, et al., Ssd: single shot multibox detector, in: ECCV, Springer, 2016, pp....
  • J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in: CVPR,...
  • J. Redmon, A. Farhadi, Yolov3: an incremental improvement, arXiv preprint...
  • G. Lisanti et al.

    Person re-identification by iterative re-weighted sparse ranking

    TPAMI

    (2015)
  • B. Ma, Y. Su, F. Jurie, Local descriptors encoded by fisher vectors for person re-identification, in: ECCV, Springer,...
  • Cited by (8)

    • Uncertainty Modeling for Group Re-Identification

      2024, International Journal of Computer Vision
    • A summary on group re-identification

      2023, Journal of Image and Graphics
    • Uncertainty Modeling with Second-Order Transformer for Group Re-identification

      2022, Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
    View all citing articles on Scopus

    Ling Mei received his M.S. degree in pattern recognition and intelligent recognition from Sun Yat-sen University in 2016. He is currently pursuing the Ph.D. degree in information and communication engineering from Sun Yat-sen University, Guangdong, China. His research interests are in computer vision, image processing, crowd motion analysis, person re-identification, and optical flow. He has authored papers in IEEE TCSVT and ICPR, and has won the 2nd Prize of the first and second National Graduate Contest on Smart-City and Creative Design in 2014 and 2015.

    Jianhuang Lai received the Ph.D. degree in mathematics in 1999 from Sun Yat-sen University, China. He joined Sun Yat-sen University in 1989 as an assistant professor, where he is currently a Professor of the School of Data and Computer Science. His current research interests are in the areas of computer vision, digital image processing, pattern recognition, multimedia communication and multiple target tracking. He has published over 100 scientific papers in the international journals and conferences on image processing and pattern recognition, e.g., IEEE TPAMI, IEEE TNN, IEEE TKDE, IEEE TIP, IEEE TSMC (Part B), IEEE TCSVT, Pattern Recognition, ICCV, CVPR, and ICDM. He serves as a deputy director of the Image and Graphics Association of China. He is a Fellow of Image and Graphics Society of China. He is also a senior member of the IEEE.

    Zhanxiang Feng received the Ph.D. degree in information and communication engineering from Sun Yat-Sen University, China, in 2018. He joined Sun Yat-sen University in 2019 as a post-doctoral fellow. His research interests include person re-identification, face recognition, face hallucination, image super-resolution, and visual surveillance. He has authored more than 10 papers including IEEE TIP, Neurocomputing and ICPR.

    Xiaohua Xie received his B.S. degree in mathematics and applied mathematics (2005) from Shantou University, a M.S. degree in information and computing science (2007), and a Ph.D. degree in applied mathematics (2010) from Sun Yat-sen University in China. He is currently an Associate Professor at Sun Yat-sen University (SYSU). Prior to joining SYSU, he was an Associate Professor at Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences. His current research fields cover image processing, computer vision, pattern recognition, and computer graphics. He has published more than a dozen papers in the prestigious international journals and conferences. He is recognized as Overseas High-Caliber Personnel (Level B) in Shenzhen, China.

    View full text