Skip to main content
Log in

A novel two-stream saliency image fusion CNN architecture for person re-identification

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Background interference, which arises from complex environment, is a critical problem for a robust person re-identification (re-ID) system. The background noise may significantly compromise the feature learning and matching process. To reduce the background interference, this paper proposes a saliency image embedding as a pedestrian descriptor. First, to eliminate the background for each pedestrian image, the saliency image is constructed, which is implemented through an unsupervised manifold ranking-based saliency detection algorithm. Second, to reduce some errors and details missing of pedestrian during the saliency image construction process, a saliency image fusion (SIF) convolutional neural network (CNN) architecture is well designed, in which the original pedestrian image and saliency image are both employed as input. We implement our idea in the identification models based on some state-of-the-art backbone CNN models (i.e., CaffeNet, VGGNet-16, GoogLeNet and ResNet-50). We show that the learned pedestrian descriptor by the proposed SIF CNN architecture provides a significant improvement over the baselines and produces a competitive performance compared with the state-of-the-art person re-ID methods on three large-scale person re-ID benchmarks (i.e., Market-1501, DukeMTMC-reID and MARS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Note that we just take the backbone CaffeNet [21] model as an example.

  2. The size is \(227\times 227\) for CaffeNet [21], while is \(224\times 224\) for other three CNN models (i.e., VGGNet-16 [39], GoogLeNet [42] and ResNet-50 [15]).

  3. CaffeNet [21] and VGGNet-16 [39] are the second-to-last fully connected layer, while GoogLeNet [42] and ResNet-50 [15] are the last pooling layer.

References

  1. Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of the CVPR, pp. 1597–1604 (2009)

  2. Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: Proceedings of the CVPR, pp. 3908–3916 (2015)

  3. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of the COMPSTAT’2010, pp. 177–186 (2010)

    Chapter  Google Scholar 

  4. Chang, X., Nie, F., Wang, S., Yang, Y., Zhou, X., Zhang, C.: Compound rank-\(k\) projections for bilinear analysis. IEEE Trans. Neural Netw. Learn. Syst. 27(7), 1502–1513 (2016)

    Article  MathSciNet  Google Scholar 

  5. Chen, D., Yuan, Z., Chen, B., Zheng, N.: Similarity learning with spatial constraints for person re-identification. In: Proceedings of the CVPR, pp. 1268–1277 (2016)

  6. Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: Proceedings of the CVPR, pp. 1335–1344 (2016)

  7. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the ICML, pp. 209–216 (2007)

  8. Dehghan, A., Modiri Assari, S., Shah, M.: Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the CVPR, pp. 4091–4099 (2015)

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255 (2009)

  10. Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)

    Article  Google Scholar 

  11. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  12. Fu, H., Zhao, H., Kong, X., Zhang, X.: Bhog: binary descriptor for sketch-based image retrieval. Multimed. Syst. 22(1), 127–136 (2016)

    Article  Google Scholar 

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587 (2014)

  14. Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the ECCV, pp. 262–275 (2008)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)

  16. Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Proceedings of the Scandinavian Conference on Image analysis, pp. 91–102 (2011)

    Chapter  Google Scholar 

  17. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)

  18. Jiang, Z., Davis, L.S.: Submodular salient region detection. In: Proceedings of the CVPR, pp. 2043–2050 (2013)

  19. Jose, C., Fleuret, F.: Scalable metric learning via weighted approximate rank component analysis. In: Proceedings of the ECCV, pp. 875–890 (2016)

    Chapter  Google Scholar 

  20. Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: Proceedings of the CVPR, pp. 2288–2295 (2012)

  21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)

  22. Li, W., Wang, X.: Locally aligned feature transforms across views. In: Proceedings of the CVPR, pp. 3594–3601 (2013)

  23. Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Proceedings of the ACCV, pp. 31–44 (2012)

    Chapter  Google Scholar 

  24. Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the CVPR, pp. 152–159 (2014)

  25. Li, X.: Tag relevance fusion for social image retrieval. Multimed. Syst. 23(1), 29–40 (2017)

    Article  Google Scholar 

  26. Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the CVPR, pp. 2197–2206 (2015)

  27. Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Yang, Y.: Improving person re-identification by attribute and identity learning (2017). arXiv:1703.07220

  28. Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)

    Article  MathSciNet  Google Scholar 

  29. Liu, J., Li, Z., Lu, H.: Sparse semantic metric learning for image retrieval. Multimed. Syst. 20(6), 635–643 (2014)

    Article  Google Scholar 

  30. Liu, Y., Shao, Y., Sun, F.: Person re-identification based on visual saliency. In: Proceedings of the Intelligent Systems Design and Applications (ISDA), pp. 884–889 (2012)

  31. Lu, S., Mahadevan, V., Vasconcelos, N.: Learning optimal seeds for diffusion-based salient object detection. In: Proceedings of the CVPR, pp. 2790–2797 (2014)

  32. Martinel, N., Das, A., Micheloni, C., Roy-Chowdhury, A.K.: Temporal model adaptation for person re-identification. In: Proceedings of the ECCV, pp. 858–877 (2016)

    Chapter  Google Scholar 

  33. Martinel, N., Micheloni, C., Foresti, G.L.: Kernelized saliency-based person re-identification through multiple metric learning. IEEE Trans. Image Process. 24(12), 5645–5658 (2015)

    Article  MathSciNet  Google Scholar 

  34. Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the CVPR, pp. 1363–1372 (2016)

  35. Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the ACM International Conference on Multimedia, pp. 17–26 (2007)

  36. Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from bow: unsupervised fine-tuning with hard examples (2016). arXiv:1604.02426

    Chapter  Google Scholar 

  37. Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Proceedings of the ECCV Workshop, pp. 17–35 (2016)

    Google Scholar 

  38. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the CVPR, pp. 815–823 (2015)

  39. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

  40. Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Proceedings of the ECCV, pp. 475–491 (2016)

    Chapter  Google Scholar 

  41. Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval (2017). arXiv:1703.05693

  42. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the CVPR, pp. 1–9 (2015)

  43. Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets (2017). arXiv preprint arXiv:1706.06196

  44. Tong, N., Lu, H., Ruan, X., Yang, M.H.: Salient object detection via bootstrap learning. In: Proceedings of the CVPR, pp. 1884–1892 (2015)

  45. Ustinova, E., Ganin, Y., Lempitsky, V.: Multi-region bilinear convolutional neural networks for person re-identification. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)

  46. Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: Proceedings of the ECCV, pp. 791–808 (2016)

    Chapter  Google Scholar 

  47. Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: Proceedings of the ECCV, pp. 135–153 (2016)

    Google Scholar 

  48. Wang, H., Gong, S., Xiang, T.: Unsupervised learning of generative topic saliency for person re-identification. In: Proceedings of the BMVC (2014)

  49. Wang, M., Konrad, J., Ishwar, P., Jing, K., Rowley, H.: Image saliency: from intrinsic to extrinsic context. In: Proceedings of the CVPR, pp. 417–424 (2011)

  50. Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Proceedings of the ECCV, pp. 29–42 (2012)

    Chapter  Google Scholar 

  51. Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Proceedings of the NIPS, pp. 1473–1480 (2005)

  52. Wu, L., Shen, C., Hengel, A.v.d.: Personnet: Person re-identification with deep convolutional neural networks (2016). arXiv:1601.07255

  53. Wu, S., Chen, Y.C., Li, X., Wu, A.C., You, J.J., Zheng, W.S.: An enhanced deep feature representation for person re-identification. In: Proceedings of the WACV, pp. 1–8 (2016)

  54. Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the CVPR, pp. 1249–1258 (2016)

  55. Yan, Y., Nie, F., Li, W., Gao, C., Yang, Y., Xu, D.: Image classification by cross-media active learning with privileged information. IEEE Trans. Multimed. 18(12), 2494–2502 (2016)

    Article  Google Scholar 

  56. Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the CVPR, pp. 3166–3173 (2013)

  57. Yang, X., Zhang, T., Xu, C.: A new discriminative coding method for image classification. Multimed. Syst. 21(2), 133–145 (2015)

    Article  Google Scholar 

  58. Yang, Y., Ma, Z., Hauptmann, A.G., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimed. 15(3), 661–669 (2013)

    Article  Google Scholar 

  59. Yang, Y., Yang, J., Yan, J., Liao, S., Yi, D., Li, S.Z.: Salient color names for person re-identification. In: Proceedings of the ECCV, pp. 536–551 (2014)

    Google Scholar 

  60. Yi, D., Liao, S., Lei, Z., Li, S.Z.: Deep metric learning for person re-identification. In: Proceedings of the ICPR, pp. 34–39 (2014)

  61. Zha, Z.J., Mei, T., Wang, Z., Hua, X.S.: Building a comprehensive ontology to refine video concept detection. In: Proceedings of the ACM International on Multimedia Information Retrieval Workshop, pp. 227–236 (2007)

  62. Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: Proceedings of the CVPR, pp. 1239–1248 (2016)

  63. Zhang, L., Yang, C., Lu, H., Xiang, R., Yang, M.H.: Ranking saliency. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1892–1904 (2017)

    Article  Google Scholar 

  64. Zhang, W., Hu, S., Liu, K.: Learning compact appearance representation for video-based person re-identification (2017). arXiv:1702.06294

  65. Zhao, R., Ouyang, W., Wang, X.: Person re-identification by salience matching. In: Proceedings of the ICCV, pp. 2528–2535 (2013)

  66. Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the CVPR, pp. 3586–3593 (2013)

  67. Zhao, R., Oyang, W., Wang, X.: Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(2), 356–370 (2017)

    Article  Google Scholar 

  68. Zheng, L., Bie, Z., Sun, Y., Wang, J., Wang, S., Su, C., Tian, Q.: Mars: A video benchmark for large-scale person re-identification. In: Proceedings of the ECCV, pp. 868–884 (2016)

    Chapter  Google Scholar 

  69. Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification (2017). arXiv:1701.07732

  70. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the ICCV, pp. 1116–1124 (2015)

  71. Zheng, L., Wang, S., Guo, P., Liang, H., Tian, Q.: Tensor index for large scale image retrieval. Multimed. Syst. 21(6), 569–579 (2015)

    Article  Google Scholar 

  72. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Fast image retrieval: query pruning and early termination. IEEE Trans. Multimed. 17(5), 648–659 (2015)

    Article  Google Scholar 

  73. Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23(8), 3368–3380 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  74. Zheng, L., Wang, S., Tian, Q.: \(\cal{L}_{p}\) -norm idf for scalable image retrieval. IEEE Trans. Image Process. 23(8), 3604–3617 (2014)

    Article  MathSciNet  Google Scholar 

  75. Zheng, L., Wang, S., Wang, J., Tian, Q.: Accurate image search with multi-scale contextual evidences. Int. J. Comput. Vis. 120(1), 1–13 (2016)

    Article  MathSciNet  Google Scholar 

  76. Zheng, L., Yang, Y., Tian, Q.: Sift meets cnn: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2017.2709749

    Article  Google Scholar 

  77. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: Proceedings of the CVPR, pp. 3346–3355 (2017)

  78. Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: Proceedings of the BMVC, pp. 23.1–23.11 (2009)

  79. Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person re-identification (2016). arXiv:1611.05666

  80. Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro (2017). arXiv:1701.07717

  81. Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the CVPR, pp. 3652–3661 (2017)

  82. Zhu, F., Chu, Q., Yu, N.: Consistent matching based on boosted salience channels for group re-identification. In: Proceedings of the ICIP, pp. 4279–4283 (2016)

Download references

Acknowledgements

This work was supported in part by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (NSFC) under Grant 71421001, in part by the National Natural Science Foundation of China (NSFC) under Grant 61502073 and Grant 61429201, and in part to Dr. Qi Tian by ARO Grants W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangwei Kong.

Additional information

Communicated by T. Mei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, F., Kong, X., Fu, H. et al. A novel two-stream saliency image fusion CNN architecture for person re-identification. Multimedia Systems 24, 569–582 (2018). https://doi.org/10.1007/s00530-017-0583-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-017-0583-4

Keywords

Navigation