Skip to main content
Log in

A multi-level descriptor using ultra-deep feature for image retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

CNN(Convolution Neural Network)-based descriptor generation is extensively studied recently for image retrieval. CNN deep feature trained for image classification is proved to have good transferability for image retrieval task. However, building a highly discriminative descriptor with CNN feature is still an important issue. The feature of the fully-connected layer is usually used and the shallow features of an image are ignored. In this paper, we proposed a simple and effective multi-level descriptor. Firstly, we proposed a multi-level feature fusion (MFF) method to capture low-level color/texture and high-level semantic information simultaneously. MFF replaces the commonly-used “object-level” with “part-level”, and the filters of convolution layer are seen as part detectors, instead of using an object detector method explicitly. The complementary nature of low-level and high-level feature benefits MFF greatly. Secondly, we trained a neural net with class information to further improve the discriminative power of MFF. Our MFF achieves good performance on public image retrieval datasets. Finally, a compressed version is proposed and achieves close performance to the uncompressed version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112

    Article  Google Scholar 

  2. Alex K, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105

  3. Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2018) NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1437–1451

    Article  Google Scholar 

  4. Azizpour H, Razavian A, Sullivan J, Maki A, Carlsson S (2014) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802

    Article  Google Scholar 

  5. Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: In european conference on computer vision, pp 584–599

  6. Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277

  7. Bay H, Tuytelaars T, Gool LV (2006) Surf: Speeded up robust features. In: European conference on computer vision, pp 404–417

  8. Deng J, Dong W, Socher R, Li L, Li K, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on computer vision and pattern recognition, pp 248–255

  9. Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: european conference on computer vision, pp 392–407

  10. Gordo A, Almazn J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision, pp 241–257

  11. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  12. Hoang T, Do T-T, Tan D-KL, Cheung N-M (2017) Selective deep convolutional features for image retrieval. Proceedings of the 2017 ACM, on Multimedia Conference, MM 2017, pp 1600–1608

  13. Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision, pp 304–317

  14. Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: 2009. CVPR 2009. IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1169–1176

  15. Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition, pp 3304–3311

  16. Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision, pp 685–701

  17. Lecun Y, Bottou L, Bengio Y (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324

  18. Li Y, Kong X, Zheng L, Tian Q (2016) Exploiting hierarchical activations of neural network for image. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 132–136

  19. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  20. Lu Y, Cohen I, Zhou XS, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th ACM international conference on Multimedia. ACM, pp 301–304

  21. Lu J, Liong V, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367

    Article  MathSciNet  MATH  Google Scholar 

  22. Lv Y, Zhou W, Tian Q, Li H (2018) Scalable bag of selected deep features for visual instance retrieval. In: International Conference on Multimedia Modeling, pp 239–251

  23. Ng HJ, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: In CVPR workshops, pp 53–61

  24. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA

  25. Pang S, Ma J, Xue J, Zhu J, Ordonez V (2018) Image Retrieval using Heat Diffusion for Deep Feature Aggregation. arXiv:1805.08587

  26. Perronnin F, Liu Y, Sanchez J (2010) H.poirier: Large-scale image retrieval with compressed fisher vectors. In: Computer vision and pattern recognition, pp 3384–3391

  27. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007. CVPR ’07. IEEE conference on Computer vision and pattern recognition, pp 1–8

  28. Radenoviċ F, Tolias G, Chum O (2016) CNN Image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. European conference on computer vision. Springer, Cham

    Google Scholar 

  29. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops, pp 806–813

  30. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  31. Salvador A, Girȯ i nieto X, Marquės F, Sato S (2016) Faster R-CNN Features for Instance Search. 2016 IEEE, Conference on Computer Vision and Pattern RecognitionWorkshops, CVPR Workshops 2016, pp 394–401

  32. Seddati O, Dupont S, Mahmoudi S, Parian M (2017) Towards Good Practices for Image Retrieval Based on CNN Features. 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops, pp 1246–1255

  33. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  34. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: ICCV, pp 1470–1477

  35. Szegedy C, Liu W, Jia Y, Sermanet P (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition, pp 1–9

  36. Tolias G, Sicre R, Jėgou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879

  37. Wang XY, Zhang B, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569

    Article  Google Scholar 

  38. Wang J, Zhang T, Jingkuan Song NS, Shen HT (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell (TPAMI) 99:1

    Google Scholar 

  39. Xie L, Hong R, Zhang B, Tian Q (2015) Image classification and retrieval are one. In: ACM On international conference on multimedia retrieval, pp 3–10

  40. Yan K, Wang Y, Liang D, Huang T, Tian Y (2016) Cnn vs sift for image retrieval: Alternative or complementary?. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 407–411

  41. Yangqing J, Evan S, Jeff D, Sergey K, Jonathan L (2014) Caffe: Convolutional architecture for fast feature embedding, pp 675–678

  42. Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241

    Article  Google Scholar 

  43. Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833

  44. Zhang S, Yang M, Cour T, Yu K, Metaxas D (2012) Query specific fusion for image retrieval. In: European conference on computer vision, pp 660–673

  45. Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: IEEE Conference on computer vision and pattern recognition, pp 1947–1954

  46. Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1741–1750

  47. Zheng L, Wang S, Wang J, Tian Q (2016) Accurate image search with multi-scale contextual evidences. IJCV 120(1):1–13

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The work was supported by the National Natural Science Foundation of China (No. 61572211).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junqing Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Z., Yu, J. A multi-level descriptor using ultra-deep feature for image retrieval. Multimed Tools Appl 78, 25655–25672 (2019). https://doi.org/10.1007/s11042-019-07771-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-07771-2

Keywords

Navigation