Abstract
In many artificial intelligence applications such as security field, it is important to identify if a specific group of pedestrians has been observed over a network of other surveillance cameras, which ascribes to the pedestrian group retrieval problem. To address this issue, this paper contributes a novel dataset for the pedestrian group retrieval named “SYSU-Group”. We collect diverse images with various pose from every camera for each group, which brings kinds of realistic challenges to the dataset, such as viewpoint variations, illumination changes and internal exchanges of group members. Moreover, we propose the Siamese Verification-Identification based Group Retrieval (SVIGR) method, which combines verification and identification modules in a Siamese network to extract person feature and follows the principle of minimum distance matching to measure the distance among pedestrian groups, and eventually gets the ranking of each query pedestrian group image. Experimental results demonstrate the superiority of SVIGR on the proposed group retrieval dataset.
Keywords
1 Introduction
Group is several stable pedestrians gathering with high motion collectiveness for a sustained period of time, which constitutes the primary unit of crowd. In this paper, we focus on a novel task, Pedestrian Group Retrieval, which aims to re-identify a specific group of persons when they appear in another regions. Namely, given an image of group in some scene, the task is to search for the images of the same group under different scenes from the gallery. Group retrieval is of great help to many public security applications including anomaly detection, suspicious activity surveillance, tracking, and crowd behavior analysis in real life. In recent years, the technology of surveillance has been developed dramatically rapidly in many public place such as airports, train stations, and supermarkets etc., which provides convenience to achieve the goal of group retrieval [25, 35].
Group retrieval is a tremendously difficult problem: Cameras are placed quite far from each other without any overlap among their views, which results in notoriously difficult problem such as different illumination conditions [20, 21] and large changes of viewpoint. Moreover, group retrieval requires to identify multiple persons who have variation in pose and interactive movement of internal members. As the group is the basic unit of crowd, a crowd with changing members can be constituted by some unrepeated groups with unchanging members, we assume the members of a group do not change and unrepeated under different cameras during the group retrieval period. Since it is at the very beginning in this field, we do not consider the changing case in this paper.
Group retrieval and person re-id differ fundamentally. Person re-id [6] re-identifies the individual person while group retrieval re-identifies specific several persons as a whole. In addition, persons tend to move together as a collective group in real life, members in a group identity have much more internal interactions than person re-id such as exchanging position, chatting or other interactive actions. Since person re-id fails to deal with the internal relationships of groups, the group retrieval task can’t be solved with traditional person re-id methods directly. To address these issues, group retrieval focuses more on the relationships among persons than person re-id.
Considering the issues above, this paper makes three contributions: (1) We propose the new pedestrian group retrieval problem and its definition. Then we create a new group retrieval dataset named “SYSU-Group”, which contains 524 persons that constitute 208 unrepeated groups under 8 high definition (HD) cameras. In order to provide a reliable benchmark, we propose mean Average Precision (mAP) and Cumulated Matching Characteristics (CMC) [13] for evaluation. (2) We proposed a Siamese Verification-Identification based Group Re-identification (SVIGR) method to solve the group retrieval. Two convolutional neural networks of verification module and identification module are combined to get all the pedestrian features, then we use the cosine distance to measure the pedestrian distance. Moreover, we adopt the minimum distance matching principle to generate the distance vector of each group. In this way, we can get a ranking list according to the ascending order of the distance. (3) Extensive experiments show the superiority of the proposed SCIGR method compared with state-of-the-art methods for achieving group retrieval, and validate the generalization of proposed SVIGR in terms of extracting original features of person re-id. We report competitive results of SVIGR on the proposed dataset for group retrieval task.
The article is organized as follows. In Sect. 1, we first introduce the definition, challenges of the group retrieval task. In Sect. 2, we review some related works about the proposed group retrieval method. In Sect. 3, we describe the SYSU-Group dataset and its evaluation protocol. Section 4 presents how to retrieval the group under multiple cameras by the proposed SVIGR framework. In Sect. 5, experimental results of the proposed method and other comparable methods are presented. In Sect. 6, we give the conclusion.
2 Related Works
In this section, we summarize the work on different aspects of group retrieval.
Pedestrian Detection. The group retrieval algorithms need pedestrian detection for preprocessing. Traditional pedestrian detection methods locate a person by means of appearance cues. [3] ultilizes HOG feature with SVM for prediction. [4] uses cascaded Adaboost [5] on integral channel features of Haar to improve detection precision and efficiency. [12] uses cues from body parts to address occlusion in crowded scene. Recently, deep learning methods like Faster RCNN [7], SSD [16] and YOLO [23] improve the performance of pedestrian detection by a large margin. Faster RCNN detect piles of sliding windows within two stages, while YOLO and SSD focus more on the global view in an end-to-end way. YOLO achieves an excellent efficiency among these methods by splitting an image into grids to obtain possible bounding box (bbox) with a fully CNN. Recently, YOLOv3 [24] replaces softmax function with logistic regression threshold and is characterized by a higher accuracy than previous versions while retaining an outstanding speed.
Feature Extraction. Most image retrieval tasks are quite associated with the method of feature extraction. WHOS [15] is a handcrafted and localized feature representation method to mitigate problems caused by background clutter and noise, WHOS features are weighted based on their distance to the center. Another similar method is GOG [19], it weights patches according to the distance to the central line, but it is not reasonable for the issues like occlusion. On the other hand, LDFV [17] uses local descriptors that include pixel spatial location, gradient, and intensity information to encode the Fisher vector [26] representation. Moreover, gBiCov [18] is a multi-scale biologically inspired feature that uses covariance descriptors to encode, it computes the distance between two persons by the Euclidean distance of their signatures. In texture based models, LOMO [14] uses HSV color histograms and extracts a scale-invariant LBP [22] texture operator, then maximizes the occurrence to make a robust representation against viewpoint changes. ELF [8] is also a texture based method, it combines color histograms in the RGB, HSV and YCbCr color spaces to obtain the texture histograms. Furthermore, HistLBP [32] also encodes the RGB, HSV, YCbCr color spaces to color histograms, and combines them with the texture histograms of LBP features. All above feature extraction methods have already applied on the person re-id task [10], we use them to compare with our proposed method to evaluate the performance of the group retrieval task in this paper.
Deeply Learned Modules. Except the handcrafted color and texture based modules, the deep learning based discrimination methods play more important roles in pedestrian retrieval like [29] which learns robust features by a refined part pooling (RPP). There are two types of mainstream deep learning modules: the verification module and the identification module. The verification module proposed in [2] is a deep metric learning that used in signature verification, which takes a pair of images as input and output the similarity according to the cosine distance. Recently, researchers have applied the verification modules in the image retrieval task such as person re-id. Yi et al. [33] used the verification module to divide the person image into three horizontal parts and trained three part-CNNs to extract features. Wu et al. [31] improved the verification module by using smaller filters and deeper network than the similar work of Ahmed et al. [1]. Verification modules are limited as the query image need to pair every gallery image, which makes the computation not efficient. On the other hand, some researchers have tried to combine the verification module with the identification module, the DeepID networks [28] optimize the network by both verification and identification modules, the work has performed well in terms of face recognition. Additionally, Zheng et al. [36] combined the two modules and applied them to person re-id task efficiently. Motivated by these works above, we propose our SVIGR method to solve the group retrieval problem.
3 SYSU-Group Dataset
In this section, a new group retrieval dataset, the “SYSU-Group” dataset is introduced. We begin with the description of the proposed dataset, then we introduce the criterion of annotated group identity, and we give the criterion about how to generate ground truth of group retrieval results, finally the evaluation protocol is introduced.
3.1 Description of the Dataset
As shown in Fig. 1, all the group images are collected from a total of 8 cameras, including three 1920 \(\times \) 1080 full High Definition (HD) cameras that placed in diverse indoor campus scenes (cam1 - cam3), and five 1280 \(\times \) 1080 HD cameras that located in diverse outdoor campus scenes (cam4 - cam8). All cameras do not overlap with each other, the group images are obtained by screenshots of the whole pedestrian group area from original video frames, each group image is resized to 256 \(\times \) 256 pixels. The dataset contains 7071 fully annotated bboxes of 208 different group identities, the number of persons in each group is from 2 to 6, images of each group identity are captured by at most eight cameras. More detailed statistic about our dataset is demonstrated in Table 1.
The target of group retrieval is different from person re-id: Group retrieval re-identifies several persons who have interaction with others such as exchanging internal position and chatting, while person re-id has only one single person in an image patch almost without interaction with others. Therefore, group retrieval identifies persons in a group not only by their appearance but also via their correlations.
To avoid ambiguity, according to the assumption of unchanging group members in Sect. 1, the same group has fixed members under all the cameras. However, the sizes of different groups are diverse which is similar to the reality. Since it is at the very beginning in this field, in order to pay more attention to group retrieval rather than detection, all the images of the proposed group retrieval dataset are hand-cropped and annotated from the original video, these hand-cropped boxes contain each whole group without noise data such as irrelevant passerby. The SYSU-Group dataset is challenging for its diversity in pose, age, illumination, occlusion and actions, which is similar to scenarios in real life. For better presentation, we choose some query images in the dataset as exemplar to show its variety in Fig. 1.
3.2 Ground Truth
Second, in addition to the standard cropped group boxes, we also provide the ground truth of the group retrieval result. We notice that the Market1501 dataset [34] also provides the good and junk indexes of all cropped boxes to evaluate. Considering this, for a hand-cropped annotated group image \(id\_cam\) (id and cam represent the group identity and the collected camera of the image, respectively). We first partition it as the corresponding collected camera and give it a sequence number. Then mark the images with same id and different cam as “good” images; and other images are marked as “junk” images, meaning these images are of zero influence to the group retrieval accuracy.
3.3 Evaluation Protocol
In this paper, we use mean Average Precision (mAP) and Cumulated Matching Characteristics (CMC) to evaluate the performance of group retrieval, which are common evaluation metrics in person re-id task. CMC gives cumulating accuracies that a query group identity appears in gallery candidate rank lists.
The group retrieval dataset is divided into training and testing sets by splitting the id numbers of all groups randomly, there are 104 training groups contain 251 persons and 104 testing groups contain 273 persons, and the number of training and testing images are 3558 and 3513, respectively. In the testing set, we select one query for each group from its appeared cameras to make up the query set. In this way, a group has at most 8 queries. The rest of the testing set is sorted out to form the gallery set, there are 704 query images and 2809 gallery images in total. Our dataset is an ideal benchmark for group retrieval methods, which can evaluate their generalization capacities for practical usages.
4 Our Method
4.1 Overall Framework
Our method has four steps: First, we use an advancing pedestrian detection to extract all the individual persons from groups. Second, we adopt a convolutional Siamese network to extract the features of all individual persons, then generate all the mutual distances of individuals by calculating the cosine distance. Third, for each group, we merge all the mutual distances that belong to the group to generate the GDV. Finally, the group distance vectors are concatenated into a group distance matrix, then we get the similarity score of groups according to the distance matrix and find the group retrieval rank list. The overall framework is shown in Fig. 2.
4.2 Pedestrian Detection
We use an advancing pedestrian detection method named YOLOv3 [24] to detect the members in each group image. YOLOv3 is a fast and accurate detector which adopt an efficient Darknet-53 network [24] that can better utilizes the GPU. We employ YOLOv3 to detect all the bboxes of pedestrians in each group and divide them into training set and testing set according to their id mentioned in Sect. 3.3. In addition, all the pedestrian bboxes can be used as the person re-id data to compare the robustness of feature extraction methods, which will be described in Sect. 5.2.
4.3 Feature Extraction
After obtaining the bbox images, we train a Siamese network proposed in [36] to extract the person re-id features of individuals. The Siamese network combines a verification module with an identification module simultaneously. Particularly, in training stage, we use the identity of group (Group-ID) as the class label for training classifier, which reflects the specific relationships among groups.
The identification module is used to treat the group retrieval as a kind of classification task with M classes of different groups, but its identification loss \(L_I\) does not influence the inter-class and the intra-class discrepancy of groups. To address the problem, the verification module is introduced to treat the group retrieval as a kind of binary classification task to indicate if they belong to a same group or not, the verification loss \({L_V}\) takes a pair of images as input and outputs a label according to their similarity loss, which enlarges the inter-class discrepancy of different groups and narrows the intra-class discrepancy of same groups. We use the verification module only in the training stage to learn the proper network parameters for the testing stage to extract pedestrian features. Moreover, the class of verification is group rather than individual person, its computing cost is low. The total training loss L in the Siamese network are computed as
Here, \(f, f_{pos}, f_{neg}\) define the output of the identification module, the verification module with positive pair and negative pair in Fig. 2, respectively. In Eq. 1, we use cross entropy loss both in \(L_I\) for identity prediction and \(L_V\) for pair verification. t is the target class, \(\theta _{I}\) and \( \theta _{V}\) are the parameters of the convolutional layer of the identification module and the verification module, respectively. \({\hat{u}}\) and \(u_c\) are predicted and probability for class respectively, where \(u_t=1\) for the target class and \(u_t=0\) otherwise. Similarly, so do the definitions of \({\hat{v}}\) and \(v_c\) for pair verification. For the positive pair, \(v_0=1, v_1 = 0\). Otherwise, \(v_0=0, v_1=1\).
For the aforementioned reasons, the Siamese network combines both the verification loss and the identification loss to make up for the shortage of each other, which can predict the identity of group and give the similarity of a pair of person bboxes. After the loss function optimization by the Siamese network in training stage, we can use the optimal parameters \(\theta _{I}\) and \( \theta _{V}\) to extract the pedestrian features in the testing stage.
4.4 Group Distance Vector (GDV)
In group retrieval, we adopt the principle of the minimum distance matching to generate the GDV. For a query group q with m members and a gallery group g with n members, we denote \(d_{q, g}^{i, j}\) as the pedestrian distance between member i in q and member j in g by computing the cosine distance between the pedestrian features. For member i in group p, it is quite intuitional for us to match the member in group g with the minimum distance between them, therefore the optimal member-group distance between member i in group q and group g is denoted as
In order to obtain a robust expression of the group distance to handle the intra-class discrepancy caused by reasons like internal position exchanges within group. We adopt the minimum distance matching principle in Eq. 2 to find the most similar member pairs. Here we give two strategies to measure the group distance. One is the Minimum Member Distance (MMD). Since there are not same member in different groups which is defined in Sect. 1, we attempt to denote the group distance as
Here, the most minimum distance among all similar member pairs can verify if there is same member in the group i and j. An alternative strategy is to use the Average Distance (AD) which defined as
In this way, we can metric distance between groups no matter their sizes are equal or not, then sort these distances by an ascending order to get GDV as
where G is the number of gallery images, then we can obtain the group distance matrix D by merging GDV as
where Q is the number of query images. In this way, the group distance matrix is concatenated by vectors of all query images. When two group images are needed to match, the similarity \(d_{q,g}\) of them can be found by the row index q and the column index g of \(\mathbf{{D}}\). Therefore, the rank list of the query group image can be generated. No matter how the group members exchange their positions, as the members of the group remain the same, both the group distance \(d_{q,g}\) defined in Eqs. 3 and 4 are invariant. Therefore, the disturbance of the exchanges of internal members in a group can be avoided.
5 Experiments
5.1 Performance of SVIGR Framework
For purposes of evaluation, We implement our Siamese network with four backbone CNN models: CaffeNet [11], VGG16 [27], GoogleNet [30], and ResNet50 [9]. In our experiment, all person re-id methods are used as basic feature extractors which use Group-IDs as the class labels for training classifier, there are both 104 different Group-IDs in training and testing stage, as shown in Table 1. After that we generate GDV from person re-id features following the minimum distance matching strategy, which is commonly used to metric the similarity between query and gallery. We compare MMD strategy (Eq. 3) with AD strategy (Eq. 4) in our SVIGR method. As Table 2 shows, MMD surpasses AD with four kinds of CNN backbone networks and all handcrafted methods, especially for the rank-1 precision and mAP. Therefore, we use MMD to obtain GDV in the following experiments.
Moreover, our method surpass all 8 state-of-the-art re-id methods including handcrafted and deep learning ones in Table 2. Although RPP [29] method is a strong deep learning baseline on person re-id, our method outperforms it in all metrics, which reveals the superiority of our proposal. The best rank-1 precision and mAP of all methods are our proposed method with ResNet50 that reaches 94.5% and 76.7% which exceeds RPP by 20.1%, 30.5%, respectively. Furthermore, Fig. 3 illustrates the group retrieval results under different complex scenarios including occlusion and illumination changes, which is detrimental to retrieval. Notably, the results show that our method not only re-identify the correct group in other cameras, but also can overcome the challenge of huge changes of viewpoint that bring out difficulty of illumination variation.
In addition, to figure out the contribution of each module, we perform ablation experiment with our proposed method on the dataset. In addition to jointly trainning networks with both module, we also train the verification module and identification module individually. Table 3 gives the quantitative results of rank-1 precision and mAP, results of all networks show that the identification is more important to the accuracy as the module do the task of classification, and the combination of the two modules can improve the performance furtherly. The results validate the effectiveness of combing two modules in our SVIGR method.
5.2 Further Experiment on Person Re-id
Besides, we evaluate the generalized ability of the proposed SVIGR on traditional person re-id task. We use the original person re-id features extracted by SVIGR with the four backbone networks that are shown as bold font in Table 4. We use the person re-id data extracted from the group images by YOLOv3 detector [24] in SYSU-Group dataset. As is divided in Table 1, persons in the training and the testing set are 251 and 273, respectively.
In this experiment, we compare the performance of SVIGR with two kinds of representative person re-id methods: One is the handcraft method such as LOMO [14], the other is the state-of-the-art method such as RPP [29].
Particularly, in training stage, we follow our group retrieval task to use Group-IDs as labels for all methods to extract original person re-id features. In testing stage, the evaluation metrics are mAP and Rank-1, which follows traditional person re-id task. Experimental results in Table 4 show that SVIGR raises by a large margin compared with traditional handcraft method, and exceeds the state-of-the-art deep learning method as well, the results verify the robustness of SVIGR in terms of feature extraction.
6 Conclusion
This paper first defines the problem of the group retrieval, then introduces a high quality group retrieval dataset and its evaluation protocol. Furthermore, an efficient SVIGR method is proposed to solve the group retrieval. In the future, more robust feature learning methods will be considered to make group retrieval benefit person re-id further, then research more innovative solution to solve this problem.
References
Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: CVPR, pp. 3908–3916 (2015)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: NIPS, pp. 737–744 (1994)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893. IEEE Computer Society (2005)
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features (2009)
Friedman, J., Hastie, T., Tibshirani, R., et al.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Gheissari, N., Sebastian, T., Hartley, R.: Person reidentification using spatiotemporal appearance. In: CVPR, vol. 2, pp. 1528–1535. IEEE (2006)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 262–275. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_21
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Karanam, S., Gou, M., Wu, Z., Rates-Borras, A., Camps, O., Radke, R.J.: A systematic evaluation and benchmark for person re-identification: features, metrics, and datasets. arXiv preprint arXiv:1605.09653 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Leibe, B., Seemann, E., Schiele, B.: Pedestrian detection in crowded scenes. In: CVPR, vol. 1, pp. 878–885. IEEE (2005)
Li, W., Zhao, R., Xiao, T., Wang, X.: DeepReID: deep filter pairing neural network for person re-identification. In: CVPR, pp. 152–159 (2014)
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: CVPR, pp. 2197–2206 (2015)
Lisanti, G., Masi, I., Bagdanov, A.D., Del Bimbo, A.: Person re-identification by iterative re-weighted sparse ranking. TPAMI 37(8), 1629–1642 (2015)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ma, B., Su, Y., Jurie, F.: Local descriptors encoded by fisher vectors for person re-identification. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012. LNCS, vol. 7583, pp. 413–422. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33863-2_41
Ma, B., Su, Y., Jurie, F.: Covariance descriptor based on bio-inspired features for person re-identification and face verification. IVC 32(6–7), 379–390 (2014)
Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical Gaussian descriptor for person re-identification. In: CVPR, pp. 1363–1372 (2016)
Mei, L., Lai, J., Xie, X., Zhu, J., Chen, J.: Illumination-invariance optical flow estimation using weighted regularization transform. TCSVT 1 (2019)
Mei, L., Chen, Z., Lai, J.: Geodesic-based probability propagation forefficient optical flow. Electron. Lett. 54, 758–760 (2018)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. PR 29(1), 51–59 (1996)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ristani, E., Tomasi, C.: Features for multi-target multi-camera tracking and re-identification. In: CVPR, pp. 6036–6046 (2018)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. IJCV 105(3), 222–245 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR, pp. 2892–2900 (2015)
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV, pp. 480–496 (2018)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Wu, L., Shen, C., van den Hengel, A.: PersonNet: person re-identification with deep convolutional neural networks. arXiv preprint arXiv:1601.07255 (2016)
Xiong, F., Gou, M., Camps, O., Sznaier, M.: Person re-identification using kernel-based metric learning methods. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 1–16. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_1
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re-identification. In: ICPR, pp. 34–39. IEEE (2014)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: CVPR, pp. 1116–1124 (2015)
Zheng, L., Yang, Y., Hauptmann, A.G.: Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016)
Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person reidentification. TOMM 14(1), 13 (2018)
Acknowledgements
This work was supported by National Key Research and Development Program of China (2016YFB1001003), the NSFC (61573387).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mei, L., Lai, J., Xie, X., Chen, Z. (2019). Siamese Network for Pedestrian Group Retrieval: A Benchmark. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-34120-6_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)