short-paper

CNN vs. SIFT for Image Retrieval: Alternative or Complementary?

Authors:
Ke Yan

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yaowei Wang

Beijing Institute of Technology, Beijing, China

Beijing Institute of Technology, Beijing, China
View Profile

,
Dawei Liang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Tiejun Huang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Yonghong Tian

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

MM '16: Proceedings of the 24th ACM international conference on MultimediaOctober 2016Pages 407–411https://doi.org/10.1145/2964284.2967252

Published:01 October 2016Publication History

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 407–411

ABSTRACT

In the past decade, SIFT is widely used in most vision tasks such as image retrieval. While in recent several years, deep convolutional neural networks (CNN) features achieve the state-of-the-art performance in several tasks such as image classification and object detection. Thus a natural question arises: for the image retrieval task, can CNN features substitute for SIFT? In this paper, we experimentally demonstrate that the two kinds of features are highly complementary. Following this fact, we propose an image representation model, complementary CNN and SIFT (CCS), to fuse CNN and SIFT in a multi-level and complementary way. In particular, it can be used to simultaneously describe scene-level, object-level and point-level contents in images. Extensive experiments are conducted on four image retrieval benchmarks, and the experimental results show that our CCS achieves state-of-the-art retrieval results.

References

D.G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarDigital Library
J.J. Foo and R. Sinha. Pruning sift for scalable near-duplicate image matching. In ADC, pages 63--71. Australian Computer Society, Inc., 2007. Google ScholarDigital Library
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In CVPR, volume 2, pages II--506. IEEE, 2004. Google ScholarDigital Library
F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, pages 1--8. IEEE, 2007.Google ScholarCross Ref
H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. TPAMI, 34(9):1704--1716, 2012. Google ScholarDigital Library
H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In CVPR, pages 3310--3317. IEEE, 2014. Google ScholarDigital Library
A. Bergamo, S. N. Sinha, and L. Torresani. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In CVPR, pages 763--770. IEEE, 2013. Google ScholarDigital Library
Z. Wang, W. Di, A. Bhardwaj, V. Jagadeesh, and R. Piramuthu. Geometric vlad for large scale image search. arXiv preprint arXiv:1403.3829, 2014.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012. Google ScholarDigital Library
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580--587. IEEE, 2014. Google ScholarDigital Library
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91--99, 2015. Google ScholarDigital Library
X. Huang, C. Shen, X. Boix, and Q. Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In ICCV, pages 262--270, 2015. Google ScholarDigital Library
J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255. IEEE, 2009.Google ScholarCross Ref
A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In ECCV, pages 584--599. Springer, 2014.Google ScholarCross Ref
J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Multimedia, pages 157--166. ACM, 2014. Google ScholarDigital Library
A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. Visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574, 2014.Google Scholar
Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In ECCV, pages 392--407. Springer, 2014.Google ScholarCross Ref
Reddy M. K. and Venkatesh B. R. Object level deep feature pooling for compact image representation. In CVPRW, pages 62--70, 2015.Google Scholar
L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and retrieval are one. In ICMR, pages 3--10. ACM, 2015. Google ScholarDigital Library
A. Babenko and V. Lempitsky. Aggregating local deep features for image retrieval. In ICCV, pages 1269--1277. IEEE, 2015. Google ScholarDigital Library
J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133, 2015.Google Scholar
L. Zheng, Y Zhao, S. Wang, J. Wang, and Q. Tian. Good practice in cnn feature transfer. arXiv preprint arXiv:1604.00133, 2016.Google Scholar
G. Tolias, R. Sicre, and H. Jégou. Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879, 2015.Google Scholar
V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard. A practical guide to cnns and fisher vectors for image instance retrieval. arXiv preprint arXiv:1508.02496, 2015.Google Scholar
L. Zheng, S. Wang, J. Wang, and Q. Tian. Accurate image search with multi-scale contextual evidences. IJCV, pages 1--13, 2016. Google ScholarDigital Library
H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304--317. Springer, 2008. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1--8. IEEE, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8. IEEE, 2008.Google ScholarCross Ref
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, volume 2, pages 2161--2168. IEEE, 2006. Google ScholarDigital Library
C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In ECCV 2014, pages 391--405. Springer, 2014.Google ScholarCross Ref
R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, pages 2911--2918. IEEE, 2012. Google ScholarDigital Library

Index Terms

CNN vs. SIFT for Image Retrieval: Alternative or Complementary?
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Image Retrieval using Multi-scale CNN Features Pooling
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

In this paper, we address the problem of image retrieval by learning images representation based on the activations of a Convolutional Neural Network. We present an end-to-end trainable network architecture that exploits a novel multi-scale local ...
Read More
SIFT-Based Image Compression
ICME '12: Proceedings of the 2012 IEEE International Conference on Multimedia and Expo

This paper proposes a novel image compression scheme based on the local feature descriptor - Scale Invariant Feature Transform (SIFT). The SIFT descriptor characterizes an image region invariantly to scale and rotation. It is used widely in image ...
Read More
SIFT-Based Image Retrieval Combining the Distance Measure of Global Image and Sub-Image
IIH-MSP '09: Proceedings of the 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing

This paper presents a similarity match method based on global image and local sub-image using the SIFT features of digital images, and applies our algorithm to Content-Based Image Retrieval. In order to make the SIFT-based image retrieval results better,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN
SIFT
complementary CNN and SIFT (CCS)
multi-level image representation
Qualifiers
- short-paper
Conference

Acceptance Rates
MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 1,522
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CNN vs. SIFT for Image Retrieval: Alternative or Complementary?

MM '16: Proceedings of the 24th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image Retrieval using Multi-scale CNN Features Pooling

SIFT-Based Image Compression

SIFT-Based Image Retrieval Combining the Distance Measure of Global Image and Sub-Image