Abstract
Background interference, which arises from complex environment, is a critical problem for a robust person re-identification (re-ID) system. The background noise may significantly compromise the feature learning and matching process. To reduce the background interference, this paper proposes a saliency image embedding as a pedestrian descriptor. First, to eliminate the background for each pedestrian image, the saliency image is constructed, which is implemented through an unsupervised manifold ranking-based saliency detection algorithm. Second, to reduce some errors and details missing of pedestrian during the saliency image construction process, a saliency image fusion (SIF) convolutional neural network (CNN) architecture is well designed, in which the original pedestrian image and saliency image are both employed as input. We implement our idea in the identification models based on some state-of-the-art backbone CNN models (i.e., CaffeNet, VGGNet-16, GoogLeNet and ResNet-50). We show that the learned pedestrian descriptor by the proposed SIF CNN architecture provides a significant improvement over the baselines and produces a competitive performance compared with the state-of-the-art person re-ID methods on three large-scale person re-ID benchmarks (i.e., Market-1501, DukeMTMC-reID and MARS).
Similar content being viewed by others
Notes
Note that we just take the backbone CaffeNet [21] model as an example.
References
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of the CVPR, pp. 1597–1604 (2009)
Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: Proceedings of the CVPR, pp. 3908–3916 (2015)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of the COMPSTAT’2010, pp. 177–186 (2010)
Chang, X., Nie, F., Wang, S., Yang, Y., Zhou, X., Zhang, C.: Compound rank-\(k\) projections for bilinear analysis. IEEE Trans. Neural Netw. Learn. Syst. 27(7), 1502–1513 (2016)
Chen, D., Yuan, Z., Chen, B., Zheng, N.: Similarity learning with spatial constraints for person re-identification. In: Proceedings of the CVPR, pp. 1268–1277 (2016)
Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: Proceedings of the CVPR, pp. 1335–1344 (2016)
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the ICML, pp. 209–216 (2007)
Dehghan, A., Modiri Assari, S., Shah, M.: Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the CVPR, pp. 4091–4099 (2015)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255 (2009)
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Fu, H., Zhao, H., Kong, X., Zhang, X.: Bhog: binary descriptor for sketch-based image retrieval. Multimed. Syst. 22(1), 127–136 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587 (2014)
Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the ECCV, pp. 262–275 (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Proceedings of the Scandinavian Conference on Image analysis, pp. 91–102 (2011)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)
Jiang, Z., Davis, L.S.: Submodular salient region detection. In: Proceedings of the CVPR, pp. 2043–2050 (2013)
Jose, C., Fleuret, F.: Scalable metric learning via weighted approximate rank component analysis. In: Proceedings of the ECCV, pp. 875–890 (2016)
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: Proceedings of the CVPR, pp. 2288–2295 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)
Li, W., Wang, X.: Locally aligned feature transforms across views. In: Proceedings of the CVPR, pp. 3594–3601 (2013)
Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Proceedings of the ACCV, pp. 31–44 (2012)
Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the CVPR, pp. 152–159 (2014)
Li, X.: Tag relevance fusion for social image retrieval. Multimed. Syst. 23(1), 29–40 (2017)
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the CVPR, pp. 2197–2206 (2015)
Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Yang, Y.: Improving person re-identification by attribute and identity learning (2017). arXiv:1703.07220
Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)
Liu, J., Li, Z., Lu, H.: Sparse semantic metric learning for image retrieval. Multimed. Syst. 20(6), 635–643 (2014)
Liu, Y., Shao, Y., Sun, F.: Person re-identification based on visual saliency. In: Proceedings of the Intelligent Systems Design and Applications (ISDA), pp. 884–889 (2012)
Lu, S., Mahadevan, V., Vasconcelos, N.: Learning optimal seeds for diffusion-based salient object detection. In: Proceedings of the CVPR, pp. 2790–2797 (2014)
Martinel, N., Das, A., Micheloni, C., Roy-Chowdhury, A.K.: Temporal model adaptation for person re-identification. In: Proceedings of the ECCV, pp. 858–877 (2016)
Martinel, N., Micheloni, C., Foresti, G.L.: Kernelized saliency-based person re-identification through multiple metric learning. IEEE Trans. Image Process. 24(12), 5645–5658 (2015)
Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the CVPR, pp. 1363–1372 (2016)
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the ACM International Conference on Multimedia, pp. 17–26 (2007)
Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from bow: unsupervised fine-tuning with hard examples (2016). arXiv:1604.02426
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Proceedings of the ECCV Workshop, pp. 17–35 (2016)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the CVPR, pp. 815–823 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Proceedings of the ECCV, pp. 475–491 (2016)
Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval (2017). arXiv:1703.05693
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the CVPR, pp. 1–9 (2015)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets (2017). arXiv preprint arXiv:1706.06196
Tong, N., Lu, H., Ruan, X., Yang, M.H.: Salient object detection via bootstrap learning. In: Proceedings of the CVPR, pp. 1884–1892 (2015)
Ustinova, E., Ganin, Y., Lempitsky, V.: Multi-region bilinear convolutional neural networks for person re-identification. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)
Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: Proceedings of the ECCV, pp. 791–808 (2016)
Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: Proceedings of the ECCV, pp. 135–153 (2016)
Wang, H., Gong, S., Xiang, T.: Unsupervised learning of generative topic saliency for person re-identification. In: Proceedings of the BMVC (2014)
Wang, M., Konrad, J., Ishwar, P., Jing, K., Rowley, H.: Image saliency: from intrinsic to extrinsic context. In: Proceedings of the CVPR, pp. 417–424 (2011)
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Proceedings of the ECCV, pp. 29–42 (2012)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Proceedings of the NIPS, pp. 1473–1480 (2005)
Wu, L., Shen, C., Hengel, A.v.d.: Personnet: Person re-identification with deep convolutional neural networks (2016). arXiv:1601.07255
Wu, S., Chen, Y.C., Li, X., Wu, A.C., You, J.J., Zheng, W.S.: An enhanced deep feature representation for person re-identification. In: Proceedings of the WACV, pp. 1–8 (2016)
Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the CVPR, pp. 1249–1258 (2016)
Yan, Y., Nie, F., Li, W., Gao, C., Yang, Y., Xu, D.: Image classification by cross-media active learning with privileged information. IEEE Trans. Multimed. 18(12), 2494–2502 (2016)
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the CVPR, pp. 3166–3173 (2013)
Yang, X., Zhang, T., Xu, C.: A new discriminative coding method for image classification. Multimed. Syst. 21(2), 133–145 (2015)
Yang, Y., Ma, Z., Hauptmann, A.G., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimed. 15(3), 661–669 (2013)
Yang, Y., Yang, J., Yan, J., Liao, S., Yi, D., Li, S.Z.: Salient color names for person re-identification. In: Proceedings of the ECCV, pp. 536–551 (2014)
Yi, D., Liao, S., Lei, Z., Li, S.Z.: Deep metric learning for person re-identification. In: Proceedings of the ICPR, pp. 34–39 (2014)
Zha, Z.J., Mei, T., Wang, Z., Hua, X.S.: Building a comprehensive ontology to refine video concept detection. In: Proceedings of the ACM International on Multimedia Information Retrieval Workshop, pp. 227–236 (2007)
Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: Proceedings of the CVPR, pp. 1239–1248 (2016)
Zhang, L., Yang, C., Lu, H., Xiang, R., Yang, M.H.: Ranking saliency. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1892–1904 (2017)
Zhang, W., Hu, S., Liu, K.: Learning compact appearance representation for video-based person re-identification (2017). arXiv:1702.06294
Zhao, R., Ouyang, W., Wang, X.: Person re-identification by salience matching. In: Proceedings of the ICCV, pp. 2528–2535 (2013)
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the CVPR, pp. 3586–3593 (2013)
Zhao, R., Oyang, W., Wang, X.: Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(2), 356–370 (2017)
Zheng, L., Bie, Z., Sun, Y., Wang, J., Wang, S., Su, C., Tian, Q.: Mars: A video benchmark for large-scale person re-identification. In: Proceedings of the ECCV, pp. 868–884 (2016)
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification (2017). arXiv:1701.07732
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the ICCV, pp. 1116–1124 (2015)
Zheng, L., Wang, S., Guo, P., Liang, H., Tian, Q.: Tensor index for large scale image retrieval. Multimed. Syst. 21(6), 569–579 (2015)
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Fast image retrieval: query pruning and early termination. IEEE Trans. Multimed. 17(5), 648–659 (2015)
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23(8), 3368–3380 (2014)
Zheng, L., Wang, S., Tian, Q.: \(\cal{L}_{p}\) -norm idf for scalable image retrieval. IEEE Trans. Image Process. 23(8), 3604–3617 (2014)
Zheng, L., Wang, S., Wang, J., Tian, Q.: Accurate image search with multi-scale contextual evidences. Int. J. Comput. Vis. 120(1), 1–13 (2016)
Zheng, L., Yang, Y., Tian, Q.: Sift meets cnn: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2017.2709749
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: Proceedings of the CVPR, pp. 3346–3355 (2017)
Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: Proceedings of the BMVC, pp. 23.1–23.11 (2009)
Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person re-identification (2016). arXiv:1611.05666
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro (2017). arXiv:1701.07717
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the CVPR, pp. 3652–3661 (2017)
Zhu, F., Chu, Q., Yu, N.: Consistent matching based on boosted salience channels for group re-identification. In: Proceedings of the ICIP, pp. 4279–4283 (2016)
Acknowledgements
This work was supported in part by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (NSFC) under Grant 71421001, in part by the National Natural Science Foundation of China (NSFC) under Grant 61502073 and Grant 61429201, and in part to Dr. Qi Tian by ARO Grants W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by T. Mei.
Rights and permissions
About this article
Cite this article
Zhu, F., Kong, X., Fu, H. et al. A novel two-stream saliency image fusion CNN architecture for person re-identification. Multimedia Systems 24, 569–582 (2018). https://doi.org/10.1007/s00530-017-0583-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-017-0583-4