A novel two-stream saliency image fusion CNN architecture for person re-identification

Zhu, Fuqing; Kong, Xiangwei; Fu, Haiyan; Tian, Qi

doi:10.1007/s00530-017-0583-4

A novel two-stream saliency image fusion CNN architecture for person re-identification

Regular Paper
Published: 29 December 2017

Volume 24, pages 569–582, (2018)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Fuqing Zhu¹,
Xiangwei Kong ORCID: orcid.org/0000-0002-0851-6752¹,
Haiyan Fu¹ &
…
Qi Tian²

1410 Accesses
7 Citations
Explore all metrics

Abstract

Background interference, which arises from complex environment, is a critical problem for a robust person re-identification (re-ID) system. The background noise may significantly compromise the feature learning and matching process. To reduce the background interference, this paper proposes a saliency image embedding as a pedestrian descriptor. First, to eliminate the background for each pedestrian image, the saliency image is constructed, which is implemented through an unsupervised manifold ranking-based saliency detection algorithm. Second, to reduce some errors and details missing of pedestrian during the saliency image construction process, a saliency image fusion (SIF) convolutional neural network (CNN) architecture is well designed, in which the original pedestrian image and saliency image are both employed as input. We implement our idea in the identification models based on some state-of-the-art backbone CNN models (i.e., CaffeNet, VGGNet-16, GoogLeNet and ResNet-50). We show that the learned pedestrian descriptor by the proposed SIF CNN architecture provides a significant improvement over the baselines and produces a competitive performance compared with the state-of-the-art person re-ID methods on three large-scale person re-ID benchmarks (i.e., Market-1501, DukeMTMC-reID and MARS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

CBAM: Convolutional Block Attention Module

Notes

Note that we just take the backbone CaffeNet [21] model as an example.
The size is \(227\times 227\) for CaffeNet [21], while is \(224\times 224\) for other three CNN models (i.e., VGGNet-16 [39], GoogLeNet [42] and ResNet-50 [15]).
CaffeNet [21] and VGGNet-16 [39] are the second-to-last fully connected layer, while GoogLeNet [42] and ResNet-50 [15] are the last pooling layer.

References

Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of the CVPR, pp. 1597–1604 (2009)
Ahmed, E., Jones, M., Marks, T.K.: An improved deep learning architecture for person re-identification. In: Proceedings of the CVPR, pp. 3908–3916 (2015)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of the COMPSTAT’2010, pp. 177–186 (2010)
Chapter Google Scholar
Chang, X., Nie, F., Wang, S., Yang, Y., Zhou, X., Zhang, C.: Compound rank-\(k\) projections for bilinear analysis. IEEE Trans. Neural Netw. Learn. Syst. 27(7), 1502–1513 (2016)
Article MathSciNet Google Scholar
Chen, D., Yuan, Z., Chen, B., Zheng, N.: Similarity learning with spatial constraints for person re-identification. In: Proceedings of the CVPR, pp. 1268–1277 (2016)
Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: Proceedings of the CVPR, pp. 1335–1344 (2016)
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of the ICML, pp. 209–216 (2007)
Dehghan, A., Modiri Assari, S., Shah, M.: Gmmcp tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the CVPR, pp. 4091–4099 (2015)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the CVPR, pp. 248–255 (2009)
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48(10), 2993–3003 (2015)
Article Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Fu, H., Zhao, H., Kong, X., Zhang, X.: Bhog: binary descriptor for sketch-based image retrieval. Multimed. Syst. 22(1), 127–136 (2016)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the CVPR, pp. 580–587 (2014)
Gray, D., Tao, H.: Viewpoint invariant pedestrian recognition with an ensemble of localized features. In: Proceedings of the ECCV, pp. 262–275 (2008)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the CVPR, pp. 770–778 (2016)
Hirzer, M., Beleznai, C., Roth, P.M., Bischof, H.: Person re-identification by descriptive and discriminative classification. In: Proceedings of the Scandinavian Conference on Image analysis, pp. 91–102 (2011)
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)
Jiang, Z., Davis, L.S.: Submodular salient region detection. In: Proceedings of the CVPR, pp. 2043–2050 (2013)
Jose, C., Fleuret, F.: Scalable metric learning via weighted approximate rank component analysis. In: Proceedings of the ECCV, pp. 875–890 (2016)
Chapter Google Scholar
Koestinger, M., Hirzer, M., Wohlhart, P., Roth, P.M., Bischof, H.: Large scale metric learning from equivalence constraints. In: Proceedings of the CVPR, pp. 2288–2295 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the NIPS, pp. 1097–1105 (2012)
Li, W., Wang, X.: Locally aligned feature transforms across views. In: Proceedings of the CVPR, pp. 3594–3601 (2013)
Li, W., Zhao, R., Wang, X.: Human reidentification with transferred metric learning. In: Proceedings of the ACCV, pp. 31–44 (2012)
Chapter Google Scholar
Li, W., Zhao, R., Xiao, T., Wang, X.: Deepreid: Deep filter pairing neural network for person re-identification. In: Proceedings of the CVPR, pp. 152–159 (2014)
Li, X.: Tag relevance fusion for social image retrieval. Multimed. Syst. 23(1), 29–40 (2017)
Article Google Scholar
Liao, S., Hu, Y., Zhu, X., Li, S.Z.: Person re-identification by local maximal occurrence representation and metric learning. In: Proceedings of the CVPR, pp. 2197–2206 (2015)
Lin, Y., Zheng, L., Zheng, Z., Wu, Y., Yang, Y.: Improving person re-identification by attribute and identity learning (2017). arXiv:1703.07220
Liu, H., Feng, J., Qi, M., Jiang, J., Yan, S.: End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017)
Article MathSciNet Google Scholar
Liu, J., Li, Z., Lu, H.: Sparse semantic metric learning for image retrieval. Multimed. Syst. 20(6), 635–643 (2014)
Article Google Scholar
Liu, Y., Shao, Y., Sun, F.: Person re-identification based on visual saliency. In: Proceedings of the Intelligent Systems Design and Applications (ISDA), pp. 884–889 (2012)
Lu, S., Mahadevan, V., Vasconcelos, N.: Learning optimal seeds for diffusion-based salient object detection. In: Proceedings of the CVPR, pp. 2790–2797 (2014)
Martinel, N., Das, A., Micheloni, C., Roy-Chowdhury, A.K.: Temporal model adaptation for person re-identification. In: Proceedings of the ECCV, pp. 858–877 (2016)
Chapter Google Scholar
Martinel, N., Micheloni, C., Foresti, G.L.: Kernelized saliency-based person re-identification through multiple metric learning. IEEE Trans. Image Process. 24(12), 5645–5658 (2015)
Article MathSciNet Google Scholar
Matsukawa, T., Okabe, T., Suzuki, E., Sato, Y.: Hierarchical gaussian descriptor for person re-identification. In: Proceedings of the CVPR, pp. 1363–1372 (2016)
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: Proceedings of the ACM International Conference on Multimedia, pp. 17–26 (2007)
Radenović, F., Tolias, G., Chum, O.: CNN image retrieval learns from bow: unsupervised fine-tuning with hard examples (2016). arXiv:1604.02426
Chapter Google Scholar
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Proceedings of the ECCV Workshop, pp. 17–35 (2016)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the CVPR, pp. 815–823 (2015)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Su, C., Zhang, S., Xing, J., Gao, W., Tian, Q.: Deep attributes driven multi-camera person re-identification. In: Proceedings of the ECCV, pp. 475–491 (2016)
Chapter Google Scholar
Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval (2017). arXiv:1703.05693
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the CVPR, pp. 1–9 (2015)
Tesfaye, Y.T., Zemene, E., Prati, A., Pelillo, M., Shah, M.: Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets (2017). arXiv preprint arXiv:1706.06196
Tong, N., Lu, H., Ruan, X., Yang, M.H.: Salient object detection via bootstrap learning. In: Proceedings of the CVPR, pp. 1884–1892 (2015)
Ustinova, E., Ganin, Y., Lempitsky, V.: Multi-region bilinear convolutional neural networks for person re-identification. In: Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017)
Varior, R.R., Haloi, M., Wang, G.: Gated siamese convolutional neural network architecture for human re-identification. In: Proceedings of the ECCV, pp. 791–808 (2016)
Chapter Google Scholar
Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: Proceedings of the ECCV, pp. 135–153 (2016)
Google Scholar
Wang, H., Gong, S., Xiang, T.: Unsupervised learning of generative topic saliency for person re-identification. In: Proceedings of the BMVC (2014)
Wang, M., Konrad, J., Ishwar, P., Jing, K., Rowley, H.: Image saliency: from intrinsic to extrinsic context. In: Proceedings of the CVPR, pp. 417–424 (2011)
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Proceedings of the ECCV, pp. 29–42 (2012)
Chapter Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Proceedings of the NIPS, pp. 1473–1480 (2005)
Wu, L., Shen, C., Hengel, A.v.d.: Personnet: Person re-identification with deep convolutional neural networks (2016). arXiv:1601.07255
Wu, S., Chen, Y.C., Li, X., Wu, A.C., You, J.J., Zheng, W.S.: An enhanced deep feature representation for person re-identification. In: Proceedings of the WACV, pp. 1–8 (2016)
Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: Proceedings of the CVPR, pp. 1249–1258 (2016)
Yan, Y., Nie, F., Li, W., Gao, C., Yang, Y., Xu, D.: Image classification by cross-media active learning with privileged information. IEEE Trans. Multimed. 18(12), 2494–2502 (2016)
Article Google Scholar
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the CVPR, pp. 3166–3173 (2013)
Yang, X., Zhang, T., Xu, C.: A new discriminative coding method for image classification. Multimed. Syst. 21(2), 133–145 (2015)
Article Google Scholar
Yang, Y., Ma, Z., Hauptmann, A.G., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimed. 15(3), 661–669 (2013)
Article Google Scholar
Yang, Y., Yang, J., Yan, J., Liao, S., Yi, D., Li, S.Z.: Salient color names for person re-identification. In: Proceedings of the ECCV, pp. 536–551 (2014)
Google Scholar
Yi, D., Liao, S., Lei, Z., Li, S.Z.: Deep metric learning for person re-identification. In: Proceedings of the ICPR, pp. 34–39 (2014)
Zha, Z.J., Mei, T., Wang, Z., Hua, X.S.: Building a comprehensive ontology to refine video concept detection. In: Proceedings of the ACM International on Multimedia Information Retrieval Workshop, pp. 227–236 (2007)
Zhang, L., Xiang, T., Gong, S.: Learning a discriminative null space for person re-identification. In: Proceedings of the CVPR, pp. 1239–1248 (2016)
Zhang, L., Yang, C., Lu, H., Xiang, R., Yang, M.H.: Ranking saliency. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1892–1904 (2017)
Article Google Scholar
Zhang, W., Hu, S., Liu, K.: Learning compact appearance representation for video-based person re-identification (2017). arXiv:1702.06294
Zhao, R., Ouyang, W., Wang, X.: Person re-identification by salience matching. In: Proceedings of the ICCV, pp. 2528–2535 (2013)
Zhao, R., Ouyang, W., Wang, X.: Unsupervised salience learning for person re-identification. In: Proceedings of the CVPR, pp. 3586–3593 (2013)
Zhao, R., Oyang, W., Wang, X.: Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39(2), 356–370 (2017)
Article Google Scholar
Zheng, L., Bie, Z., Sun, Y., Wang, J., Wang, S., Su, C., Tian, Q.: Mars: A video benchmark for large-scale person re-identification. In: Proceedings of the ECCV, pp. 868–884 (2016)
Chapter Google Scholar
Zheng, L., Huang, Y., Lu, H., Yang, Y.: Pose invariant embedding for deep person re-identification (2017). arXiv:1701.07732
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: Proceedings of the ICCV, pp. 1116–1124 (2015)
Zheng, L., Wang, S., Guo, P., Liang, H., Tian, Q.: Tensor index for large scale image retrieval. Multimed. Syst. 21(6), 569–579 (2015)
Article Google Scholar
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Fast image retrieval: query pruning and early termination. IEEE Trans. Multimed. 17(5), 648–659 (2015)
Article Google Scholar
Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. IEEE Trans. Image Process. 23(8), 3368–3380 (2014)
Article MathSciNet MATH Google Scholar
Zheng, L., Wang, S., Tian, Q.: \(\cal{L}_{p}\) -norm idf for scalable image retrieval. IEEE Trans. Image Process. 23(8), 3604–3617 (2014)
Article MathSciNet Google Scholar
Zheng, L., Wang, S., Wang, J., Tian, Q.: Accurate image search with multi-scale contextual evidences. Int. J. Comput. Vis. 120(1), 1–13 (2016)
Article MathSciNet Google Scholar
Zheng, L., Yang, Y., Tian, Q.: Sift meets cnn: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2017). https://doi.org/10.1109/TPAMI.2017.2709749
Article Google Scholar
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Yang, Y., Tian, Q.: Person re-identification in the wild. In: Proceedings of the CVPR, pp. 3346–3355 (2017)
Zheng, W.S., Gong, S., Xiang, T.: Associating groups of people. In: Proceedings of the BMVC, pp. 23.1–23.11 (2009)
Zheng, Z., Zheng, L., Yang, Y.: A discriminatively learned CNN embedding for person re-identification (2016). arXiv:1611.05666
Zheng, Z., Zheng, L., Yang, Y.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro (2017). arXiv:1701.07717
Zhong, Z., Zheng, L., Cao, D., Li, S.: Re-ranking person re-identification with k-reciprocal encoding. In: Proceedings of the CVPR, pp. 3652–3661 (2017)
Zhu, F., Chu, Q., Yu, N.: Consistent matching based on boosted salience channels for group re-identification. In: Proceedings of the ICIP, pp. 4279–4283 (2016)

Download references

Acknowledgements

This work was supported in part by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (NSFC) under Grant 71421001, in part by the National Natural Science Foundation of China (NSFC) under Grant 61502073 and Grant 61429201, and in part to Dr. Qi Tian by ARO Grants W911NF-15-1-0290 and Faculty Research Gift Awards by NEC Laboratories of America and Blippar.

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, China
Fuqing Zhu, Xiangwei Kong & Haiyan Fu
University of Texas at San Antonio, San Antonio, TX, 78249, USA
Qi Tian

Authors

Fuqing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangwei Kong
View author publications
You can also search for this author in PubMed Google Scholar
Haiyan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangwei Kong.

Additional information

Communicated by T. Mei.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, F., Kong, X., Fu, H. et al. A novel two-stream saliency image fusion CNN architecture for person re-identification. Multimedia Systems 24, 569–582 (2018). https://doi.org/10.1007/s00530-017-0583-4

Download citation

Received: 19 June 2017
Accepted: 26 December 2017
Published: 29 December 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00530-017-0583-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel two-stream saliency image fusion CNN architecture for person re-identification

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel two-stream saliency image fusion CNN architecture for person re-identification

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation