research-article

Learning Discriminative Features for Image Retrieval

Authors:

Yingying ZhuAuthors Info & Claims

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

Pages 96 - 104

https://doi.org/10.1145/3323873.3325032

Published: 05 June 2019 Publication History

Abstract

Discriminative local features obtained from activations of convolutional neural networks have proven to be essential for image retrieval. To improve retrieval performance, many recent works aim to obtain more powerful and discriminative features. In this work, we propose a new attention layer to assess the importance of local features and assign higher weights to those more discriminative. Furthermore, we present a scale and mask module to filter out the meaningless local features and scale the major components. This module not only reduces the impact of the various scales of the major components in images by scaling them on the feature maps, but also filters out the redundant and confusing features with the MAX-Mask. Finally, the features are aggregated into the image representation. Experimental evaluations demonstrate that the proposed method outperforms the state-of-the-art methods on standard image retrieval datasets.

References

[1]

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5297--5307.

[2]

Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014).

[3]

Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In European conference on computer vision. Springer, 584--599.

[4]

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[5]

Sean Bell and Kavita Bala. 2015. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics (TOG), Vol. 34, 4 (2015), 98.

Digital Library

[6]

Stéphane Bres and Jean-Michel Jolion. 1999. Detection of interest points for image indexation. In International Conference on Advances in Visual Information Systems. Springer, 427--435.

Digital Library

[7]

Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3640--3649.

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. Ieee, 248--255.

[9]

Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, and Ngai-Man Cheung. 2018. From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval. arXiv preprint arXiv:1802.02899 (2018).

Digital Library

[10]

Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In CVPR, Vol. 2. 3.

[11]

Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. 2015. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623 (2015).

Digital Library

[12]

Robert M Haralick. 1979. Statistical and structural approaches to texture. Proc. IEEE, Vol. 67, 5 (1979), 786--804.

[13]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[15]

Rui Hou, Chen Chen, and Mubarak Shah. 2017. Tube convolutional neural network (T-CNN) for action detection in videos. In Proceedings of the IEEE International Conference on Computer Vision. 5822--5831.

[16]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132--7141.

[17]

Jing Huang, S Ravi Kumar, Mandar Mitra, Wei-Jing Zhu, and Ramin Zabih. 1997. Image indexing using color correlograms. In Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 762--768.

Digital Library

[18]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, et almbox. 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017--2025.

Digital Library

[19]

Anil K Jain and Aditya Vailaya. 1996. Image retrieval using color and shape. Pattern recognition, Vol. 29, 8 (1996), 1233--1244.

[20]

Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 3304--3311.

[21]

Hervé Jégou and Andrew Zisserman. 2014. Triangulation embedding and democratic aggregation for image search. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3310--3317.

Digital Library

[22]

Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2016. Cross-dimensional weighting for aggregated deep convolutional features. In European Conference on Computer Vision. Springer, 685--701.

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

Digital Library

[24]

Kevin Lai, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2011. A large-scale hierarchical multi-view rgb-d object dataset. In Robotics and Automation (ICRA), 2011 IEEE International Conference on. IEEE, 1817--1824.

[25]

Haoning Lin, Zhenwei Shi, and Zhengxia Zou. 2017. Fully convolutional network with task partitioning for inshore ship detection in optical remote sensing images. IEEE Geoscience and Remote Sensing Letters, Vol. 14, 10 (2017), 1665--1669.

[26]

David G Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision, Vol. 60, 2 (2004), 91--110.

Digital Library

[27]

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).

[28]

Jianchang Mao and Anil K Jain. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern recognition, Vol. 25, 2 (1992), 173--188.

Digital Library

[29]

Volodymyr Mnih, Nicolas Heess, Alex Graves, et almbox. 2014. Recurrent models of visual attention. In Advances in neural information processing systems. 2204--2212.

Digital Library

[30]

Florent Perronnin and Christopher Dance. 2007. Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE conference on computer vision and pattern recognition. IEEE, 1--8.

[31]

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 1--8.

[32]

James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. (2008).

[33]

Filip Radenovic, Johannes L Schonberger, Dinghuang Ji, Jan-Michael Frahm, Ondrej Chum, and Jiri Matas. 2016. From dusk till dawn: Modeling in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5488--5496.

[34]

Filip Radenović, Giorgos Tolias, and Ondvr ej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In European conference on computer vision. Springer, 3--20.

[35]

Filip Radenović, Giorgos Tolias, and Ondrej Chum. 2018. Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).

[36]

Johannes L Schonberger, Filip Radenovic, Ondrej Chum, and Jan-Michael Frahm. 2015. From single image query to detailed 3d reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5126--5134.

[37]

Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806--813.

Digital Library

[38]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[39]

Josef Sivic and Andrew Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In null. IEEE, 1470.

Digital Library

[40]

Markus Andreas Stricker and Markus Orengo. 1995. Similarity of color images. In Storage and Retrieval for Image and Video Databases III, Vol. 2420. International Society for Optics and Photonics, 381--393.

[41]

Bin Sun, Chen Chen, Yingying Zhu, and Jianmin Jiang. 2019. GeoCapsNet: Aerial to Ground view Image Geo-localization using Capsule Network. arXiv preprint arXiv:1904.06281 (2019).

[42]

Michael J Swain and Dana H Ballard. 1991. Color indexing. International journal of computer vision, Vol. 7, 1 (1991), 11--32.

Digital Library

[43]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.

[44]

Yicong Tian, Chen Chen, and Mubarak Shah. 2017. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3608--3616.

[45]

Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).

[46]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. arXiv preprint arXiv:1704.06904 (2017).

[47]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057.

Digital Library

[48]

Joe Yue-Hei Ng, Fan Yang, and Larry S Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 53--61.

[49]

Baochang Zhang, Jiaxin Gu, Chen Chen, Jungong Han, Xiangbo Su, Xianbin Cao, and Jianzhuang Liu. 2018. One-two-one networks for compression artifacts reduction in remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, Vol. 145 (2018), 184--196.

[50]

Yingying Zhu, Jiong Wang, Lingxi Xie, and Liang Zheng. 2018. Attention-based pyramid aggregation network for visual place recognition. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 99--107.

Digital Library

Cited By

Li YGuan CGao J(2023)TsP-Tran: Two-Stage Pure Transformer for Multi-Label Image RetrievalProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592269(425-433)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592269
Zhu YWang YChen HGuo ZHuang Q(2023)Large-Scale Image Retrieval with Deep Attentive Global FeaturesInternational Journal of Neural Systems10.1142/S012906572350013233:03Online publication date: 25-Feb-2023
https://doi.org/10.1142/S0129065723500132
Fu HLi YZhang HLiu JYao TGurrin CÞór Jónsson BKando NSchoeffmann KChen PO'Connor N(2020)Rank-embedded Hashing for Large-scale Image RetrievalProceedings of the 2020 International Conference on Multimedia Retrieval10.1145/3372278.3390716(563-570)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1145/3372278.3390716

Index Terms

Learning Discriminative Features for Image Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

Image Retrieval Using Fused Deep Convolutional Features

This paper proposes an image retrieval using fused deep convolutional features to solve the semantic gap between low-level features and high-level semantic features of traditional contend-based image retrieval method. Firstly, the improved network ...
Effective features in content-based image retrieval from a combination of low-level features and deep Boltzmann machine
Abstract
Image retrieval is a convenient way to browse and search for a set of similar images. The main challenge of Content-based Image Retrieval (CBIR) systems is to extract the appropriate feature vector for image description. In this research, a ...
Content -- Based Image Retrieval Using the Dual-Tree Complex Wavelet Transform
MCSI '14: Proceedings of the 2014 International Conference on Mathematics and Computers in Sciences and in Industry

The following paper presents a novel and effective algorithm for Content-Based Image Retrieval. To design it we used the Dual-Tree Complex Wavelet Transform for image feature extraction and Hausdorff distance to compute similarity distance between the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

June 2019

427 pages

ISBN:9781450367653

DOI:10.1145/3323873

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada
,
Alberto Del Bimbo
University of Florence, Italy
,
Zhongfei Zhang
Binghamton University, State University of New York, USA
,
Program Chairs:
Alexander Hauptmann
Carnegie Mellon University, USA
,
K. Selcuk Candan
Arizona State University, USA
,
Marco Bertini
University of Florence, Italy
,
Lexing Xie
Australia National University, Australia
,
Xiao-Yong Wei
Sichuan University, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China (Grant No. 61602314)
Natural Science Foundation of Guangdong Province of China (Grant No. 2016A030313043)
Fundamental Research Project in the Science and Technology Plan of Shenzhen (Grant No. JCYJ20160331114551175)

Conference

ICMR '19

Sponsor:

SIGMM

ICMR '19: International Conference on Multimedia Retrieval

June 10 - 13, 2019

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YGuan CGao J(2023)TsP-Tran: Two-Stage Pure Transformer for Multi-Label Image RetrievalProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592269(425-433)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592269
Zhu YWang YChen HGuo ZHuang Q(2023)Large-Scale Image Retrieval with Deep Attentive Global FeaturesInternational Journal of Neural Systems10.1142/S012906572350013233:03Online publication date: 25-Feb-2023
https://doi.org/10.1142/S0129065723500132
Fu HLi YZhang HLiu JYao TGurrin CÞór Jónsson BKando NSchoeffmann KChen PO'Connor N(2020)Rank-embedded Hashing for Large-scale Image RetrievalProceedings of the 2020 International Conference on Multimedia Retrieval10.1145/3372278.3390716(563-570)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1145/3372278.3390716

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten