Skip to main content
Log in

Scale-fusion framework for improving video-based person re-identification performance

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Video-based person re-identification (re-id), which aims to match people through videos captured by non-overlapping camera views, has attracted lots of research interest recently. In this paper, we first propose a novel hybrid 2D and 3D convolution-based recurrent neural network (HCRN) for video-based person re-id task. Specifically, the 3D convolutional module can explore the local short-term fast-varying motion information, while the recurrent layer can leverage the global long-term spatial–temporal information. Based on HCRN, we design a scale-fusion framework to make full use of features of different scales to further improve the performance of video-based person re-id. More concretely, the scale-fusion framework preserves a complete subnetwork similar to HCRN for each scale to extract features and exchanges information between all subnetworks at several stages of the framework. Besides, we propose a training method called species invasion to further improve the performance of HCRN and scale-fusion framework by utilizing a large amount of unlabeled data. Experimental results on the publicly available PRID 2011, iLIDS-VID and MARS multi-shot pedestrian re-id datasets demonstrate the effectiveness of the proposed HCRN, scale-fusion framework and species invasion training method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bazzani L, Cristani M, Perina A, Murino V (2012) Multiple-shot person re-identification by chromatic and epitomic analyses. Pattern Recognit Lett 33(7):898–903

    Article  Google Scholar 

  2. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    MATH  Google Scholar 

  3. Chen D, Li H, Xiao T, Yi S, Wang X (2018) Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1169–1178

  4. Chen YC, Zhu X, Zheng WS, Lai JH (2018) Person re-identification by camera correlation aware feature augmentation. IEEE Trans Pattern Anal Mach Intell 40(2):392–408

    Article  Google Scholar 

  5. Cheng D, Gong Y, Zhou S, Wang J, Zheng N (2016) Person re-identification by multi-channel parts-based cnn with improved triplet loss function. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1335–1344

  6. Cheng L, Jing XY, Zhu X, Fumin Q, Fei M, Xiaodong J, Liang Y, Chunhe W (2018) A hybrid 2D and 3D convolution and recurrent network for video-based person re-identification. In: International conference on neural information processing, ICONIP. Springer, pp 439–451

  7. Chung D, Tahboub K, Delp EJ (2017) A two stream siamese convolutional neural network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1983–1991

  8. Dehghan A, Modiri Assari S, Shah M (2015) Gmmcp yracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4091–4099

  9. Dou T, Zhou W (2018) 2D and 3D convolutional neural network fusion for predicting the histological grade of hepatocellular carcinoma. In: 24th International conference on pattern recognition, ICPR 2018, pp 3832–3837

  10. Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2360–2367

  11. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1933–1941

  12. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  13. Friedman I, Chemla I, Smolyansky E, Stepanov M, Afansyeva I, Sharir G, Nadir S, Rorlich S Gygo (2017) An e-commerce video object segmentation dataset by visualead. https://github.com/ilchemla/gygo-dataset. Accessed Nov 2018

  14. Fuqing Z, Xiangwei K, Qun W, Haiyan F, Ming L (2018) A loss combination based deep model for person re-identification. Multimed Tools Appl 77(3):3049–3069

    Article  Google Scholar 

  15. Hadsell R, Chopra S, Lecun Y (2006) Dimensionality reduction by learning an invariant mapping. In: IEEE computer society conference on computer vision and pattern recognition, pp 1735–1742

  16. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  17. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  18. Hirzer M, Beleznai C, Roth PM, Bischof H (2011) Person re-identification by descriptive and discriminative classification. In: Image analysis, pp 91–102

  19. Huang Y, Wang W, Wang L (2018) Video super-resolution via bidirectional recurrent convolutional networks. IEEE Trans Pattern Anal Mach Intell 40(4):1015–1028

    Article  Google Scholar 

  20. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  21. Ji Y, Zhang H, Wu QMJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140

    Article  Google Scholar 

  22. Jing XY, Zhu X, Wu F, You X, Liu Q, Yue D, Hu R, Xu B (2015) Super-resolution person re-identification with semi-coupled low-rank discriminant dictionary learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 695–704

  23. Lee K, Zlateski A, Vishwanathan A, Seung HS (2015) Recursive training of 2D–3D convolutional networks for neuronal boundary detection. CoRR arxiv: abs/1508.04843

  24. Li D, Chen X, Zhang Z, Huang K (2017) Learning deep context-aware features over body and latent parts for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7398–7407

  25. Li S, Shao M, Fu Y (2018) Person re-identification by cross-view multi-level dictionary learning. IEEE Trans Pattern Anal Mach Intell 40(12):2963–2977

    Article  Google Scholar 

  26. Liu H, Feng J, Qi M, Jiang J, Yan S (2017) End-to-end comparative attention networks for person re-identification. IEEE Trans Image Process 26(7):3492–3506

    Article  MathSciNet  Google Scholar 

  27. Liu H, Jie Z, Jayashree K, Qi M, Jiang J, Yan S, Feng J (2018) Video-based person re-identification with accumulative motion context. IEEE Trans Circuits Syst Video Technol 28(10):2788–2802

    Article  Google Scholar 

  28. Liu K, Ma B, Zhang W, Huang R (2015) A spatio-temporal appearance representation for video-based pedestrian re-identification. In: IEEE conference on ICCV, pp 3810–3818

  29. Ma L, Yang X, Tao D (2014) Person re-identification over camera networks using multi-task distance metric learning. IEEE Trans Image Process 23(8):3656–3670

    Article  MathSciNet  Google Scholar 

  30. McLaughlin N, Martinez del Rincon J, Miller P (2016) Recurrent convolutional network for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1325–1334

  31. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS 2017 autodiff workshop: the future of gradient-based machine learning software and techniques. Long Beach, CA, USA

    Google Scholar 

  32. Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  33. Roa-Barco L, Serradilla-Casado O, de Velasco-Vázquez M, López-Zorrilla A, Graña M, Chyzhyk D, Price C (2017) A 2D/3D convolutional neural network for brain white matter lesion detection in multimodal MRI. In: Proceedings of the 10th international conference on computer recognition systems CORES, pp 377–385

  34. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR arxiv: abs/1212.0402

  35. Su C, Yang F, Zhang S, Tian Q, Davis LS, Gao W (2015) Multi-task learning with low rank attribute embedding for person re-identification. In: IEEE conference on ICCV, pp 3739–3747

  36. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. CoRR arxiv: abs/1902.09212

  37. Sun K, Zhao Y, Jiang B, Cheng T, Xiao B, Liu D, Mu Y, Wang X, Liu W, Wang J (2019) High-resolution representations for labeling pixels and regions. CoRR arxiv: abs/1904.04514

  38. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, p 12

  39. Tiezhu L, Lijuan S, Chong H, Jian G (2018) Person re-identification using salient region matching game. Multimed Tools Appl 77(16):21393–21415

    Article  Google Scholar 

  40. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE conference on ICCV, pp 4489–4497

    Google Scholar 

  41. Wang T, Gong S, Zhu X, Wang S (2014) Person re-identification by video ranking. In: ECCV, pp 688–703

  42. Wu Y, Yang D, Zhou R, Wang D (2016) Dictionary co-learning for multiple-shot person re-identification. In: Chinese conference on biometric recognition. Springer, pp 675–685

  43. Xie Y, Yu H, Gong X, Dong Z, Gao Y (2015) Learning visual-spatial saliency for multiple-shot person re-identification. IEEE Signal Process Lett 22(11):1854–1858

    Article  Google Scholar 

  44. Xu S, Cheng Y, Gu K, Yang Y, Chang S, Zhou P (2017) Jointly attentive spatial-temporal pooling networks for video-based person re-identification. arXiv preprint arXiv:1708.02286

  45. You J, Wu A, Li X, Zheng WS (2016) Top-push video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1345–1353

    Google Scholar 

  46. Yu H, Wang J, Huang Z, Yang Y, Xu W (2016) Video paragraph captioning using hierarchical recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4584–4593

  47. Yunlu X, Jie G, Zheng H, Weidong Q (2018) Sparse coding with cross-view invariant dictionaries for person re-identification. Multimed Tools Appl 77(9):10715–10732

    Article  Google Scholar 

  48. Zhang H, Ji Y, Wang H, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl 2:1–20

    Google Scholar 

  49. Zhang W, Ma B, Liu K, Huang R (2017) Video-based pedestrian re-identification by adaptive spatio-temporal appearance model. IEEE Trans Image Process Publ IEEE Signal Process Soc 26(4):2042–2054

    Article  MathSciNet  Google Scholar 

  50. Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q (2016) Mars: a video benchmark for large-scale person re-identification. In: ECCV. Springer, pp 868–884

  51. Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1741–1750

  52. Zheng WS, Gong S, Xiang T (2015) Towards open-world person re-identification by one-shot group-based verification. IEEE Trans Pattern Anal Mach Intell 38(3):591–606

    Article  Google Scholar 

  53. Zhong Z, Zheng L, Zheng Z, Li S, Yang Y (2018) Camera style adaptation for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5157–5166

    Google Scholar 

  54. Zhou Z, Huang Y, Wang W, Wang L, Tan T (2017) See the forest for the trees: joint spatial and temporal recurrent neural networks for video-based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6776–6785

    Google Scholar 

  55. Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593

  56. Zhu X, Jing XY, Fei M, Li C, Ren Y (2019) Simultaneous visual-appearance-level and spatial-temporal-level dictionary learning for video-based person re-identification. Neural Comput Appl 31(11):7303–7315

    Article  Google Scholar 

  57. Zhu X, Jing XY, Wu F, Feng H (2016) Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In: IJCAI, pp 3552–3559

Download references

Acknowledgements

The authors would like to thank the editors and anonymous reviewers for their constructive comments and suggestions. This work was supported by the NSFC-Key Project under Grant No. 61933013, the NSFC-Key Project of General Technology Fundamental Research United Fund under Grant No. U1736211, the Key Project of Natural Science Foundation of Hubei Province under Grant No. 2018CFA024, the Natural Science Foundation of Guangdong Province under Grant No. 2019A1515011076, the National Key Research and Development Program of China under Grant No. 2017YFB0202001, the National Nature Science Foundation of China under Grant No. 61672208, the Higher Education Institution Key Research Projects of Henan Province, No. 19A520001, the Key Scientific and Technological Project of Henan Province, No. 192102210277, the Project of Chinese Postdoctoral Science Foundation NO. 2019M652624.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Yuan Jing.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, L., Jing, XY., Zhu, X. et al. Scale-fusion framework for improving video-based person re-identification performance. Neural Comput & Applic 32, 12841–12858 (2020). https://doi.org/10.1007/s00521-020-04730-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-04730-z

Keywords

Navigation