Abstract
Person detection and Re-identification are two well-defined support tasks for practically relevant tasks such as Person Search and Multiple Person Tracking. Person Search aims to find and locate all instances with the same identity as the query person in a set of panoramic gallery images. Similarly, Multiple Person Tracking, especially when using the tracking-by-detection pipeline, requires to detect and associate all appeared persons in consecutive video frames. One major challenge shared by the two tasks comes from the contradictory goals of detection and re-identification, i.e, person detection focuses on finding the commonness of all persons while person re-ID handles the differences among multiple identities. Therefore, it is crucial to reconcile the relationship between the two support tasks in a joint model. To this end, we present a novel approach called Norm-Aware Embedding to disentangle the person embedding into norm and angle for detection and re-ID respectively, allowing for both effective and efficient multi-task training. We further extend the proposal-level person embedding to pixel-level, whose discrimination ability is less affected by misalignment. Our Norm-Aware Embedding achieves remarkable performance on both person search and multiple person tracking benchmarks, with the merit of being easy to train and resource-friendly.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Code will be updated at this site.
References
Ahmed, E., Jones, M., & Marks, T. K. (2015). An improved deep learning architecture for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2015.7299016
Babaee, M., Athar, A., Rigoll, G. (2018) Multiple people tracking using hierarchical deep tracklet re-identification. arXiv preprint arXiv:1811.04091
Bergmann, P., Meinhardt, T., Leal-Taixe, L. (2019). Tracking without bells and whistles. In: ICCV
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: the clear mot metrics. EURASIP Journal on Image and Video Processing, 2008, 1–10.
Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L. (2009). Robust tracking-by-detection using a detector confidence particle filter. In: ICCV
Chang, X., Huang, P.Y., Shen, Y.D., Liang, X., Yang, Y., Hauptmann, A.G. (2018). Rcaa: Relational context-aware agents for person search. In: ECCV
Chen, D., Zhang, S., Ouyang, W., Yang, J., Schiele, B. (2020). Hierarchical online instance matching for person search. In: AAAI
Chen, D., Zhang, S., Ouyang, W., Yang, J., Tai, Y. (2018). Person search via a mask-guided two-stream cnn model. In: ECCV
Chen, D., Zhang, S., Ouyang, W., Yang, J., & Tai, Y. (2020). Person search by separated modeling and a mask-guided two-stream cnn model. TIP, 29, 4669–4682.
Chen, D., Zhang, S., Yang, J., Schiele, B. (2020). Norm-aware embedding for efficient person search. In: CVPR
Cheng, D., Gong, Y., Zhou, S., Wang, J., & Zheng, N. (2016). Person re-identification by multi-channel parts-based CNN with improved triplet loss function. CVPR. https://doi.org/10.1109/CVPR.2016.149
Choi, W. (2015). Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV
Chu, P., Ling, H. (2019). Famnet: Joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: ICCV
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L. (2020). Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR
Deng, J., Guo, J., Xue, N., Zafeiriou, S. (2018) Arcface: Additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698
Ding, S., Lin, L., Wang, G., & Chao, H. (2015). Deep feature learning with relative distance comparison for person re-identification. PR, 48(10), 2993–3003. https://doi.org/10.1016/j.patcog.2015.04.005
Dollar, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. TPAMI, 36(8), 1532–1545. https://doi.org/10.1109/TPAMI.2014.2300479
Dollar, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In: BMVC. https://doi.org/10.5244/C.23.91
Evangelidis, G. D., & Psarakis, E. Z. (2008). Parametric image alignment using enhanced correlation coefficient maximization. TPAMI, 30(10), 1858–1865.
Fan, X., Jiang, W., Luo, H., Fei, M. (2018). Spherereid: Deep hypersphere manifold embedding for person re-identification. arXiv preprint arXiv:1807.00537
Farenzena, M., Bazzani, L., Perina, A., Murino, V., & Cristani, M. (2010). Person re-identification by symmetry-driven accumulation of local features. CVPR. https://doi.org/10.1109/CVPR.2010.5539926
Felzenszwalb, P. F., Girshick, R. B., Mcallester, D., & Ramanan, D. (2009). Object detection with discriminatively trained part based models. TPAMI, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Feng, W., Hu, Z., Wu, W., Yan, J., Ouyang, W. (2019). Multi-object tracking with multiple cues and switcher-aware classification. arXiv preprint arXiv:1901.06129
Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Girshick, R., Iandola, F., Darrell, T., Malik, J. (2015). Deformable part models are convolutional neural networks. In: CVPR
Guo, Y., Zhang, L. (2017). One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574
Han, C., Ye, J., Zhong, Y., Tan, X., Zhang, C., Gao, C., Sang, N. (2019). Re-id driven localization refinement for person search. In: ICCV
He, K., Gkioxari, G., Dollár, P., Girshick, R. (2017). Mask r-cnn. In: ICCV
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: CVPR
Henschel, R., Zou, Y., Rosenhahn, B. (2019). Multiple people tracking using body and joint detections. In: CVPRW
Ioffe, S., Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML
Keuper, M., Tang, S., Zhongjie, Y., Andres, B., Brox, T., Schiele, B. (2016). A multi-cut formulation for joint segmentation and tracking of multiple objects. arXiv preprint arXiv:1607.06317
Kim, C., Li, F., Ciptadi, A., Rehg, J.M. (2015). Multiple hypothesis tracking revisited. In: ICCV
Kostinger, M., Hirzer, M., Wohlhart, P., Roth, P. M., & Bischof, H. (2012). Large scale metric learning from equivalence constraints. CVPR. https://doi.org/10.1109/CVPR.2012.6247939
Kuo, C.H., Nevatia, R. (2011). How does person identity recognition help multi-person tracking? In: CVPR
Lan, X., Zhu, X., Gong, S. (2018). Person search by multi-scale matching. In: ECCV
Leal-Taixé, L., Canton-Ferrer, C., Schindler, K. (2016). Learning by tracking: Siamese cnn for robust target association. In: CVPRW
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K. (2015). Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942
Li, W., Zhao, R., Xiao, T., & Wang, X. (2014). DeepReID: Deep filter pairing neural network for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2014.27
Li, X., Zheng, W. S., Wang, X., Xiang, T., & Gong, S. (2015). Multi-scale learning for low-resolution person re-identification. ICCV. https://doi.org/10.1109/ICCV.2015.429
Liao, S., Hu, Y., Zhu, X., Li, S.Z. (2015). Person re-identification by local maximal occurrence representation and metric learning. In: CVPR
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S. (2017). Feature pyramid networks for object detection. In: CVPR
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L. (2014). Microsoft coco: Common objects in context. In: ECCV
Liu, H., Feng, J., Jie, Z., Jayashree, K., Zhao, B., Qi, M., Jiang, J., Yan, S. (2017). Neural person search machines. In: ICCV
Liu, H., Feng, J., Qi, M., Jiang, J., & Yan, S. (2017). End-to-end comparative attention networks for person re-identification. TIP, 26(7), 3492–3506. https://doi.org/10.1109/TIP.2017.2700762
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In: CVPR
Liu, W., Wen, Y., Yu, Z., Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In: ICML
Lu, Z., Rathod, V., Votel, R., Huang, J. (2020). Retinatrack: Online single stage joint detection and tracking. arXiv preprint arXiv:2003.13870
Ma, L., Tang, S., Black, M.J., Van Gool, L. (2018). Customized multi-person tracker. In: ACCV
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831
Milan, A., Roth, S., & Schindler, K. (2013). Continuous energy minimization for multitarget tracking. TPAMI, 36(1), 58–72.
Munjal, B., Amin, S., Tombari, F., Galasso, F. (2019). Query-guided end-to-end person search. In: CVPR
Ouyang, W., Wang, X. (2012). A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR
Ouyang, W., Wang, X. (2013). Joint deep learning for pedestrian detection. In: ICCV
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A. (2017). Automatic differentiation in pytorch. In: NIPS-W
Pirsiavash, H., Ramanan, D., Fowlkes, C.C. (2011). Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B. (2016). Deepcut: Joint subset partition and labeling for multi person pose estimation. In: CVPR
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. TPAMI, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In: ECCV
Tang, S., Andres, B., Andriluka, M., Schiele, B. (2015). Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041
Tang, S., Andriluka, M., Andres, B., Schiele, B. (2017). Multiple people tracking by lifted multicut and person re-identification. In: CVPR
Tian, Z., Shen, C., Chen, H., He, T. (2019). Fcos: Fully convolutional one-stage object detection. In: ICCV
Varior, R. R., Shuai, B., Lu, J., Xu, D., & Wang, G. (2016). A siamese long short-term memory architecture for human re-identification. ECCV
Wang, X., Doretto, G., Sebastian, T., Rittscher, J., & Tu, P. (2007). Shape and appearance context modeling. ICCV. https://doi.org/10.1109/ICCV.2007.4409019
Wang, Y., Gong, D., Zhou, Z., Ji, X., Wang, H., Li, Z., Liu, W., Zhang, T. (2018). Orthogonal deep features decomposition for age-invariant face recognition. In: ECCV
Wang, Z., Zheng, L., Liu, Y., Wang, S (2019)Towards real-time multiobject tracking. arXiv preprint arXiv:1909.12605
Wei, L., Zhang, S., Yao, H., Gao, W., Tian, Q.: Glad: Global-local-alignment descriptor for pedestrian retrieval. In: ACM’MM (2017)
Wen, L., Li, W., Yan, J., Lei, Z., Yi, D., Li, S.Z.: Multiple target tracking based on undirected hierarchical relation hypergraph. In: CVPR (2014)
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. ECCV
Xiang, J., Xu, G., Ma, C., Hou, J. (2020). End-to-end learning deep crf models for multi-object tracking. TCSVT
Xiang, W., Huang, J., Qi, X., Hua, X.S., Zhang, L. (2018). Homocentric hypersphere feature embedding for person re-identification. arXiv preprint arXiv:1804.08866
Xiang, Y., Alahi, A., Savarese, S.: Learning to track: Online multi-object tracking by decision making. In: ICCV (2015)
Xiao, J., Xie, Y., Tillo, T., Huang, K., Wei, Y., Feng, J. (2017). Ian: The individual aggregation network for person search. arXiv preprint arXiv:1705.05552
Xiao, T., Li, H., Ouyang, W., Wang, X. (2016). Learning deep feature representations with domain guided dropout for person re-identification. In: CVPR
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X. (2017). Joint detection and identification feature learning for person search. In: CVPR
Xu, J., Zhao, R., Zhu, F., Wang, H., Ouyang, W. (2018). Attention-aware compositional network for person re-identification. In: CVPR
Xu, Y., Osep, A., Ban, Y., Horaud, R., Leal-Taixé, L., Alameda-Pineda, X. (2020). How to train your deep multi-object tracker. In: CVPR
Yan, Y., Li, J., Qin, J., Bai, S., Liao, S., Liu, L., Zhu, F., Shao, L. (2021). Anchor-free person search. In: CVPR
Yan, Y., Qin, J., Ni, B., Chen, J., Liu, L., Zhu, F., Zheng, W. S., Yang, X., & Shao, L. (2020). Learning multi-attention context graph for group-based re-identification. TPAMI. https://doi.org/10.1109/TPAMI.2020.3032542
Yan, Y., Zhang, Q., Ni, B., Zhang, W., Xu, M., Yang, X.: Learning context graph for person search. In: CVPR (2019)
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: CVPR (2016)
Yao, H., Zhang, S., Hong, R., Zhang, Y., Xu, C., & Tian, Q. (2019). Deep representation learning with part loss for person re-identification. TIP, 28(6), 2860–2871.
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Deep metric learning for person re-identification. ICPR. https://doi.org/10.1109/ICPR.2014.16
Zhang, L., Xiang, T., Gong, S. (2016). Learning a discriminative null space for person re-identification. In: CVPR
Zhang, S., Bauckhage, C., Cremers, A.B. (2014) Informed haar-like features improve pedestrian detection. In: CVPR
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B. (2016).How far are we from solving pedestrian detection? In: CVPR
Zhang, S., Benenson, R., Omran, M., Hosang, J., & Schiele, B. (2018). Towards reaching human performance in pedestrian detection. TPAMI, 40(4), 973–986. https://doi.org/10.1109/TPAMI.2017.2700460
Zhang, S., Benenson, R., Schiele, B. (2015). Filtered channel features for pedestrian detection. In: CVPR
Zhang, S., Benenson, R., Schiele, B. (2017). Citypersons: A diverse dataset for pedestrian detection. In: CVPR
Zhang, S., Yang, J., Schiele, B. (2018). Occluded pedestrian detection through guided attention in cnns. In: CVPR
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: Fairmot: On the fairness of detection and re-identification in multiple object tracking. arXiv preprint arXiv:2004.01888 (2020)
Zhao, L., Li, X., Zhuang, Y., Wang, J.: Deeply-learned part-aligned representations for person re-identification. In: ICCV (2017)
Zhao, R., Ouyang, W., & Wang, X. (2013). Unsupervised salience learning for person re-identification. CVPR. https://doi.org/10.1109/CVPR.2013.460
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., & Tian, Q. (2016). Mars: A video benchmark for large-scale person re-identification. ECCV
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In: ICCV
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q. (2017). Person re-identification in the wild. In: CVPR
Zhou, X., Wang, D., Krähenbühl, P. (2019). Objects as points. arXiv preprint arXiv:1904.07850
Acknowledgements
This work was partially supported by the National Science Fund of China (Grant No. U1713208), Funds for International Co-operation and Exchange of the National Natural Science Foundation of China (Grant No. 61861136011), “111” Program B13022, Natural Science Foundation of Jiangsu Province, China (Grant No. BK20181299), and National Key Research and Development Program of China (Grant No. 2017YFC0820601).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Ivan Laptev.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, D., Zhang, S., Yang, J. et al. Norm-Aware Embedding for Efficient Person Search and Tracking. Int J Comput Vis 129, 3154–3168 (2021). https://doi.org/10.1007/s11263-021-01512-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-021-01512-5