Skip to main content
Log in

Selective deep ensemble for instance retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In public security systems, visual instance retrieval has an explosive growing requirement, especially for large-scale image or video databases. Due to its wide range of applications in surveillance scenario, this paper aims at the retrieval tasks centered around ‘vehicle’ and ‘pedestrian’ targets. Many previous CNN-based methods have not exploited the ensemble abilities of different models, which achieve limited accuracy since a certain kind of deep architecture is not comprehensive. On the other hand, some features in the original deep representation are useless for retrieval tasks, while the attention-aware compact representation will be much more efficient and effective. To address the above problems, we propose a Selective Deep Ensemble (SDE) framework to combine various models and features in a complementary way, inspired by the attention mechanism. It is demonstrated that a large improvement can be acquired with slight increase on computation cost. Finally, we evaluate the performance on three public instance-retrieval datasets, VehicleID, VeRi and Market-1501, outperforming state-of-the-art methods by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval[C]. Proceedings of the IEEE international conference on computer vision, pp 1269–1277

  2. Bai Y, Gao F, Lou Y et al (2017) Incorporating intra-class variance to fine-grained visual recognition[J]. arXiv preprint arXiv:1703.00196

    Google Scholar 

  3. Gordo A, Almazán J, Revaud J et al (2016) Deep image retrieval: Learning global representations for image search[C]. European Conference on Computer Vision. Springer International Publishing, pp 241–257

  4. Hariharan B, Arbeláez P, Girshick R et al (2015) Hypercolumns for object segmentation and fine-grained localization[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456

  5. He K, Zhang X, Ren S et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition[C]. European conference on computer vision. Springer, Cham, pp 346–361

    Google Scholar 

  6. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  7. Hoang T, Do TT, Tan DKL et al (2017) Selective deep convolutional features for image retrieval[J]. arXiv preprint arXiv:1707.00809

    Google Scholar 

  8. Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks[J]. arXiv preprint arXiv:1709.01507

    Google Scholar 

  9. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features[C]. European conference on computer vision. Springer International Publishing, pp 685–701

  10. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks[C]. Advances in neural information processing systems, pp 1097–1105

  11. Lin TY, Dollár P, Girshick R et al (2016) Feature pyramid networks for object detection[J]. arXiv preprint arXiv:1612.03144

    Google Scholar 

  12. Liu X, Liu W, Mei T et al (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance[C]. European conference on computer vision. Springer International Publishing, pp 869–884

  13. Liu H, Tian Y, Yang Y et al (2016) Deep relative distance learning: tell the difference between similar vehicles[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2167–2175

  14. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  15. Ma C, Huang J B, Yang X et al (2015) Hierarchical convolutional features for visual tracking[C]. Proceedings of the IEEE international conference on computer vision, pp 3074–3082

  16. Radenović F, Tolias G, Chum O (2016) CNN image retrieval learns from BoW: unsupervised fine-tuning with hard examples[C]. European conference on computer vision. Springer International Publishing, pp 3–20

  17. Razavian A S, Azizpour H, Sullivan J et al (2014) CNN features off-the-shelf: an astounding baseline for recognition[C]. Computer vision and pattern recognition workshops (CVPRW), 2014 I.E. conference on. IEEE, pp 512–519

  18. Razavian AS, Sullivan J, Carlsson S, Maki A (2016) Visual instance retrieval with deep convolutional networks[J]. ITE Transactions on Media Technology and Applications 4(3):251–258

    Article  Google Scholar 

  19. Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks[C]. Advances in neural information processing systems, pp 91–99

  20. Sermanet P, Eigen D, Zhang X et al (2014) Overfeat: integrated recognition, localization and detection using convolutional networks[C]. In: ICLR

  21. Shen Y, Xiao T, Li H et al (2017) Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals[J]. arXiv preprint arXiv:1708.03918

    Google Scholar 

  22. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition[C]. In: ICLR

  23. Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  24. Tolias G, Sicre R, Jégou H (2016) Particular object retrieval with integral max-pooling of CNN activations[C]. In: ICLR

  25. Veit A, Wilber MJ, Belongie S (2016) Residual networks behave like ensembles of relatively shallow networks[C]. Advances in neural information processing systems, pp 550–558

  26. Xu Q, Yan K, Tian Y (2017) Learning a repression network for precise vehicle search[J]. arXiv preprint arXiv:1708.02386

    Google Scholar 

  27. Yuan Y, Yang K, Zhang C (2017) Hard-aware deeply cascaded embedding[C]. Proceedings of the IEEE international conference on computer vision

  28. Yue-Hei Ng J, Yang F, Davis LS (2015) Exploiting local features from deep networks for image retrieval[C]. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 53–61

  29. Zhang Y, Liu D, Zha ZJ (2017) Improving triplet-wise training of convolutional neural network for vehicle re-identification[C]. Multimedia and expo (ICME), 2017 I.E. international conference on. IEEE, pp 1386–1391

  30. Zheng L, Shen L, Tian L et al (2015) Scalable person re-identification: a benchmark[C]. Proceedings of the IEEE international conference on computer vision, pp 1116–1124

  31. Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past, present and future[J]. arXiv preprint arXiv:1610.02984

    Google Scholar 

  32. Zheng Z, Zheng L, Yang Y (2016) A discriminatively learned cnn embedding for person re-identification[J]. arXiv preprint arXiv:1611.05666

    Google Scholar 

  33. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro[J]. arXiv preprint arXiv:1701.07717

    Google Scholar 

Download references

Acknowledgements

The authors of this paper are members of Shanghai Engineering Research Center of Intelligent Video Surveillance. Dr. Lei Song is also a visiting researcher with Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China. Our research was sponsored by following projects: the National Natural Science Foundation of China (61402116、61403084); Program of Science and Technology Commission of Shanghai Municipality (No. 15530701300, No. 15XD1520200, No. 17511106803); 2012 IoT Program of Ministry of Industry and Information Technology of China; Key Project of the Ministry of Public Security (No. 2014JSYJA007); the Project of the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University (ESSCKF 2015-03); Shanghai Rising-Star Program(17QB1401000); the Special Fund for Basic R&D Expenses of Central Level Public Welfare Scientific Research Institutions (C17384); National Key R&D program of China (2016YFC0801304, 2017YFC0803705), supported by CCF-Venustech Open Research Fund (Grant No. CCF-VenustechRP2017006), and supported by Guangxi Key Laboratory of Cryptography and Information Security (No.GCIS201719).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, Z., Song, L., Zhang, X. et al. Selective deep ensemble for instance retrieval. Multimed Tools Appl 78, 5751–5767 (2019). https://doi.org/10.1007/s11042-018-5967-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5967-8

Keywords

Navigation