skip to main content
10.1145/2964284.2967252acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

CNN vs. SIFT for Image Retrieval: Alternative or Complementary?

Published:01 October 2016Publication History

ABSTRACT

In the past decade, SIFT is widely used in most vision tasks such as image retrieval. While in recent several years, deep convolutional neural networks (CNN) features achieve the state-of-the-art performance in several tasks such as image classification and object detection. Thus a natural question arises: for the image retrieval task, can CNN features substitute for SIFT? In this paper, we experimentally demonstrate that the two kinds of features are highly complementary. Following this fact, we propose an image representation model, complementary CNN and SIFT (CCS), to fuse CNN and SIFT in a multi-level and complementary way. In particular, it can be used to simultaneously describe scene-level, object-level and point-level contents in images. Extensive experiments are conducted on four image retrieval benchmarks, and the experimental results show that our CCS achieves state-of-the-art retrieval results.

References

  1. D.G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.J. Foo and R. Sinha. Pruning sift for scalable near-duplicate image matching. In ADC, pages 63--71. Australian Computer Society, Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. In CVPR, volume 2, pages II--506. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. H. Jégou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. TPAMI, 34(9):1704--1716, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Jégou and A. Zisserman. Triangulation embedding and democratic aggregation for image search. In CVPR, pages 3310--3317. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Bergamo, S. N. Sinha, and L. Torresani. Leveraging structure from motion to learn discriminative codebooks for scalable landmark classification. In CVPR, pages 763--770. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Wang, W. Di, A. Bhardwaj, V. Jagadeesh, and R. Piramuthu. Geometric vlad for large scale image search. arXiv preprint arXiv:1403.3829, 2014.Google ScholarGoogle Scholar
  9. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097--1105, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google ScholarGoogle Scholar
  11. R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, pages 580--587. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, pages 91--99, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Huang, C. Shen, X. Boix, and Q. Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In ICCV, pages 262--270, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248--255. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. In ECCV, pages 584--599. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li. Deep learning for content-based image retrieval: A comprehensive study. In Multimedia, pages 157--166. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. Visual instance retrieval with deep convolutional networks. arXiv preprint arXiv:1412.6574, 2014.Google ScholarGoogle Scholar
  18. Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. In ECCV, pages 392--407. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  19. Reddy M. K. and Venkatesh B. R. Object level deep feature pooling for compact image representation. In CVPRW, pages 62--70, 2015.Google ScholarGoogle Scholar
  20. L. Xie, R. Hong, B. Zhang, and Q. Tian. Image classification and retrieval are one. In ICMR, pages 3--10. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Babenko and V. Lempitsky. Aggregating local deep features for image retrieval. In ICCV, pages 1269--1277. IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local features from deep networks for image retrieval. arXiv preprint arXiv:1504.05133, 2015.Google ScholarGoogle Scholar
  23. L. Zheng, Y Zhao, S. Wang, J. Wang, and Q. Tian. Good practice in cnn feature transfer. arXiv preprint arXiv:1604.00133, 2016.Google ScholarGoogle Scholar
  24. G. Tolias, R. Sicre, and H. Jégou. Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879, 2015.Google ScholarGoogle Scholar
  25. V. Chandrasekhar, J. Lin, O. Morère, H. Goh, and A. Veillard. A practical guide to cnns and fisher vectors for image instance retrieval. arXiv preprint arXiv:1508.02496, 2015.Google ScholarGoogle Scholar
  26. L. Zheng, S. Wang, J. Wang, and Q. Tian. Accurate image search with multi-scale contextual evidences. IJCV, pages 1--13, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, pages 304--317. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, pages 1--8. IEEE, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, volume 2, pages 2161--2168. IEEE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In ECCV 2014, pages 391--405. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  32. R. Arandjelović and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, pages 2911--2918. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CNN vs. SIFT for Image Retrieval: Alternative or Complementary?

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '16: Proceedings of the 24th ACM international conference on Multimedia
        October 2016
        1542 pages
        ISBN:9781450336031
        DOI:10.1145/2964284

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader