Abstract
CNN(Convolution Neural Network)-based descriptor generation is extensively studied recently for image retrieval. CNN deep feature trained for image classification is proved to have good transferability for image retrieval task. However, building a highly discriminative descriptor with CNN feature is still an important issue. The feature of the fully-connected layer is usually used and the shallow features of an image are ignored. In this paper, we proposed a simple and effective multi-level descriptor. Firstly, we proposed a multi-level feature fusion (MFF) method to capture low-level color/texture and high-level semantic information simultaneously. MFF replaces the commonly-used “object-level” with “part-level”, and the filters of convolution layer are seen as part detectors, instead of using an object detector method explicitly. The complementary nature of low-level and high-level feature benefits MFF greatly. Secondly, we trained a neural net with class information to further improve the discriminative power of MFF. Our MFF achieves good performance on public image retrieval datasets. Finally, a compressed version is proposed and achieves close performance to the uncompressed version.
Similar content being viewed by others
References
Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112
Alex K, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J (2018) NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1437–1451
Azizpour H, Razavian A, Sullivan J, Maki A, Carlsson S (2014) Factors of transferability for a generic convnet representation. IEEE Trans Pattern Anal Mach Intell 38(9):1790–1802
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: In european conference on computer vision, pp 584–599
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277
Bay H, Tuytelaars T, Gool LV (2006) Surf: Speeded up robust features. In: European conference on computer vision, pp 404–417
Deng J, Dong W, Socher R, Li L, Li K, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on computer vision and pattern recognition, pp 248–255
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: european conference on computer vision, pp 392–407
Gordo A, Almazn J, Revaud J, Larlus D (2016) Deep image retrieval: learning global representations for image search. In: European conference on computer vision, pp 241–257
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoang T, Do T-T, Tan D-KL, Cheung N-M (2017) Selective deep convolutional features for image retrieval. Proceedings of the 2017 ACM, on Multimedia Conference, MM 2017, pp 1600–1608
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision, pp 304–317
Jégou H, Douze M, Schmid C (2009) On the burstiness of visual elements. In: 2009. CVPR 2009. IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1169–1176
Jegou H, Douze M, Schmid C, Perez P (2010) Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition, pp 3304–3311
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision, pp 685–701
Lecun Y, Bottou L, Bengio Y (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li Y, Kong X, Zheng L, Tian Q (2016) Exploiting hierarchical activations of neural network for image. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 132–136
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lu Y, Cohen I, Zhou XS, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th ACM international conference on Multimedia. ACM, pp 301–304
Lu J, Liong V, Zhou J (2017) Deep hashing for scalable image search. IEEE Trans Image Process 26(5):2352–2367
Lv Y, Zhou W, Tian Q, Li H (2018) Scalable bag of selected deep features for visual instance retrieval. In: International Conference on Multimedia Modeling, pp 239–251
Ng HJ, Yang F, Davis L (2015) Exploiting local features from deep networks for image retrieval. In: In CVPR workshops, pp 53–61
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA
Pang S, Ma J, Xue J, Zhu J, Ordonez V (2018) Image Retrieval using Heat Diffusion for Deep Feature Aggregation. arXiv:1805.08587
Perronnin F, Liu Y, Sanchez J (2010) H.poirier: Large-scale image retrieval with compressed fisher vectors. In: Computer vision and pattern recognition, pp 3384–3391
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: 2007. CVPR ’07. IEEE conference on Computer vision and pattern recognition, pp 1–8
Radenoviċ F, Tolias G, Chum O (2016) CNN Image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. European conference on computer vision. Springer, Cham
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops, pp 806–813
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Salvador A, Girȯ i nieto X, Marquės F, Sato S (2016) Faster R-CNN Features for Instance Search. 2016 IEEE, Conference on Computer Vision and Pattern RecognitionWorkshops, CVPR Workshops 2016, pp 394–401
Seddati O, Dupont S, Mahmoudi S, Parian M (2017) Towards Good Practices for Image Retrieval Based on CNN Features. 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops, pp 1246–1255
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: ICCV, pp 1470–1477
Szegedy C, Liu W, Jia Y, Sermanet P (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognition, pp 1–9
Tolias G, Sicre R, Jėgou H (2015) Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879
Wang XY, Zhang B, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569
Wang J, Zhang T, Jingkuan Song NS, Shen HT (2017) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell (TPAMI) 99:1
Xie L, Hong R, Zhang B, Tian Q (2015) Image classification and retrieval are one. In: ACM On international conference on multimedia retrieval, pp 3–10
Yan K, Wang Y, Liang D, Huang T, Tian Y (2016) Cnn vs sift for image retrieval: Alternative or complementary?. In: Proceedings of the 2016 ACM on Multimedia Conference, pp 407–411
Yangqing J, Evan S, Jeff D, Sergey K, Jonathan L (2014) Caffe: Convolutional architecture for fast feature embedding, pp 675–678
Yu W, Yang K, Yao H, Sun X, Xu P (2017) Exploiting the complementary strengths of multi-layer cnn features for image retrieval. Neurocomputing 237:235–241
Zeiler M, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, pp 818–833
Zhang S, Yang M, Cour T, Yu K, Metaxas D (2012) Query specific fusion for image retrieval. In: European conference on computer vision, pp 660–673
Zheng L, Wang S, Liu Z, Tian Q (2014) Packing and padding: coupled multi-index for accurate image retrieval. In: IEEE Conference on computer vision and pattern recognition, pp 1947–1954
Zheng L, Wang S, Tian L, He F, Liu Z, Tian Q (2015) Query-adaptive late fusion for image search and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1741–1750
Zheng L, Wang S, Wang J, Tian Q (2016) Accurate image search with multi-scale contextual evidences. IJCV 120(1):1–13
Acknowledgements
The work was supported by the National Natural Science Foundation of China (No. 61572211).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, Z., Yu, J. A multi-level descriptor using ultra-deep feature for image retrieval. Multimed Tools Appl 78, 25655–25672 (2019). https://doi.org/10.1007/s11042-019-07771-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-07771-2