ABSTRACT
In the past decade, SIFT is widely used in most vision tasks such as image retrieval. While in recent several years, deep convolutional neural networks (CNN) features achieve the state-of-the-art performance in several tasks such as image classification and object detection. Thus a natural question arises: for the image retrieval task, can CNN features substitute for SIFT? In this paper, we experimentally demonstrate that the two kinds of features are highly complementary. Following this fact, we propose an image representation model, complementary CNN and SIFT (CCS), to fuse CNN and SIFT in a multi-level and complementary way. In particular, it can be used to simultaneously describe scene-level, object-level and point-level contents in images. Extensive experiments are conducted on four image retrieval benchmarks, and the experimental results show that our CCS achieves state-of-the-art retrieval results.
- D.G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
- J.J. Foo and R. Sinha. Pruning sift for scalable near-duplicate image matching. In ADC, pages 63--71. Australian Computer Society, Inc., 2007. Google ScholarDigital Library
- Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In CVPR, volume 2, pages II--506. IEEE, 2004. Google ScholarDigital Library
- F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, pages 1--8. IEEE, 2007.Google ScholarCross Ref
- H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. TPAMI, 34(9):1704--1716, 2012. Google ScholarDigital Library
- H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In CVPR, pages 3310--3317. IEEE, 2014. Google ScholarDigital Library
- A. Bergamo, S. N. Sinha, and L. Torresani. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In CVPR, pages 763--770. IEEE, 2013. Google ScholarDigital Library
- Z. Wang, W. Di, A. Bhardwaj, V. Jagadeesh, and R. Piramuthu. Geometric vlad for large scale image search. arXiv preprint arXiv:1403.3829, 2014.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012. Google ScholarDigital Library
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580--587. IEEE, 2014. Google ScholarDigital Library
- S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91--99, 2015. Google ScholarDigital Library
- X. Huang, C. Shen, X. Boix, and Q. Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In ICCV, pages 262--270, 2015. Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255. IEEE, 2009.Google ScholarCross Ref
- A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In ECCV, pages 584--599. Springer, 2014.Google ScholarCross Ref
- J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Multimedia, pages 157--166. ACM, 2014. Google ScholarDigital Library
- A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. Visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574, 2014.Google Scholar
- Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In ECCV, pages 392--407. Springer, 2014.Google ScholarCross Ref
- Reddy M. K. and Venkatesh B. R. Object level deep feature pooling for compact image representation. In CVPRW, pages 62--70, 2015.Google Scholar
- L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and retrieval are one. In ICMR, pages 3--10. ACM, 2015. Google ScholarDigital Library
- A. Babenko and V. Lempitsky. Aggregating local deep features for image retrieval. In ICCV, pages 1269--1277. IEEE, 2015. Google ScholarDigital Library
- J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133, 2015.Google Scholar
- L. Zheng, Y Zhao, S. Wang, J. Wang, and Q. Tian. Good practice in cnn feature transfer. arXiv preprint arXiv:1604.00133, 2016.Google Scholar
- G. Tolias, R. Sicre, and H. Jégou. Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879, 2015.Google Scholar
- V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard. A practical guide to cnns and fisher vectors for image instance retrieval. arXiv preprint arXiv:1508.02496, 2015.Google Scholar
- L. Zheng, S. Wang, J. Wang, and Q. Tian. Accurate image search with multi-scale contextual evidences. IJCV, pages 1--13, 2016. Google ScholarDigital Library
- H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304--317. Springer, 2008. Google ScholarDigital Library
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1--8. IEEE, 2007.Google ScholarCross Ref
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8. IEEE, 2008.Google ScholarCross Ref
- D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, volume 2, pages 2161--2168. IEEE, 2006. Google ScholarDigital Library
- C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In ECCV 2014, pages 391--405. Springer, 2014.Google ScholarCross Ref
- R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, pages 2911--2918. IEEE, 2012. Google ScholarDigital Library
Index Terms
- CNN vs. SIFT for Image Retrieval: Alternative or Complementary?
Recommendations
Image Retrieval using Multi-scale CNN Features Pooling
ICMR '20: Proceedings of the 2020 International Conference on Multimedia RetrievalIn this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local ...
SIFT-Based Image Compression
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and ExpoThis paper proposes a novel image compression scheme based on the local feature descriptor - Scale Invariant Feature Transform (SIFT). The SIFT descriptor characterizes an image region invariantly to scale and rotation. It is used widely in image ...
SIFT-Based Image Retrieval Combining the Distance Measure of Global Image and Sub-Image
IIH-MSP '09: Proceedings of the 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal ProcessingThis paper presents a similarity match method based on global image and local sub-image using the SIFT features of digital images, and applies our algorithm to Content-Based Image Retrieval. In order to make the SIFT-based image retrieval results better,...
Comments