Instance-level object retrieval via deep region CNN

Mei, Shuhuan; Min, Weiqing; Duan, Hua; Jiang, Shuqiang

doi:10.1007/s11042-018-6427-1

Instance-level object retrieval via deep region CNN

Published: 13 September 2018

Volume 78, pages 13247–13261, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuhuan Mei^1,2,
Weiqing Min²,
Hua Duan¹ &
…
Shuqiang Jiang^2,3

415 Accesses
4 Citations
Explore all metrics

Abstract

Instance retrieval is a fundamental problem in the multimedia field for its various applications. Since the relevancy is defined at the instance level, it is more challenging comparing to traditional image retrieval methods. Recent advances show that Convolutional Neural Networks (CNNs) offer an attractive method for image feature representations. However, the CNN method extracts features from the whole image, thus the extracted features contain a large amount of background noisy information, leading to poor retrieval performance. To solve the problem, this paper proposed a deep region CNN method with object detection for instance-level object retrieval, which has two phases, i.e., offline Faster R-CNN training and online instance retrieval. First, we train a Faster R-CNN model to better locate the region of the objects. Second, we extract the CNN features from the detected object image region and then retrieve relevant images based on the visual similarity of these features. Furthermore, we utilized three different strategies for feature fusing based on the detected object region candidates from Faster R-CNN. We conduct the experiment on a large dataset: INSTRE with 23,070 object images and additional one million distractor images. Qualitative and quantitative evaluation results have demonstrated the advantage of our proposed method. In addition, we conducted extensive experiments on the Oxford dataset and the experimental results further validated the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 7

Deep Encoding Features for Instance Retrieval

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Article 02 March 2020

Yi-yang Zhang, Yong Feng, … Bao-hua Qiang

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

References

Arandjelovic R, Zisserman A (2013) All about VLAD. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1578–1585
Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277
Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrieval. In: European conference on computer vision, pp 584–599. Springer, Berlin
Chandrasekhar V, Lin J, Morere O, Veillard A, Goh H (2015) Compact global descriptors for visual search. In: Data compression conference (DCC), 2015, pp 333–342. IEEE
Chen DM, Girod B (2015) A hybrid mobile visual search system with compact global signatures. IEEE Transactions on Multimedia 17(7):1019–1030
Article Google Scholar
Chu L, Jiang S, Wang S, Zhang Y, Huang Q (2013) Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Transactions on Multimedia 15(8):1982–1996
Article Google Scholar
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition Icml, vol 32, pp 647–655
Duan LY, Ji R, Chen Z, Huang T, Gao W (2014) Towards mobile document image retrieval for digital library. IEEE Transactions on Multimedia 16(2):346–359
Article Google Scholar
Duan LY, Lin J, Wang Z, Huang T, Gao W (2015) Weighted component hashing of binary aggregated descriptors for fast visual search. IEEE Transactions on multimedia 17(6):828–842
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gong Y, Wang L, Guo R, Lazebnik S (2014) Multi-scale orderless pooling of deep convolutional activation features. In: European conference on computer vision, pp 392–407. Springer, Berlin
Gordo A, Almazán J, Revaud J, Larlus D (2016) Deep image retrieval: Learning global representations for image search. In: European conference on computer vision, pp 241–257. Springer, Berlin
Gordo A, Larlus D (2017) Beyond instance-level image retrieval: Leveraging captions to learn a global visual representation for semantic retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR)
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361. Springer, Berlin
Hoang T, Do TT, Le Tan DK, Cheung NM (2017) Selective deep convolutional features for image retrieval. In: Proceedings of the 2017 ACM on Multimedia Conference, pp 1600–1608
Hong R, Li L, Cai J, Tao D, Wang M, Tian Q (2017) Coherent semantic-visual indexing for large-scale image retrieval in the cloud. IEEE Trans Image Process 26(9):4128–4138
Article MathSciNet MATH Google Scholar
Jegou H, Douze M, Schmid C (2008) Hamming embedding and weak geometric consistency for large scale image search. Computer Vision–ECCV 2008:304–317
Google Scholar
Ji R, Duan LY, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Transactions on Multimedia 15(1):153–166
Article Google Scholar
Jiang YG, Wang J, Xue X, Chang SF (2013) Query-adaptive image search with hash codes. IEEE transactions on Multimedia 15(2):442–453
Article Google Scholar
Kalantidis Y, Mellina C, Osindero S (2016) Cross-dimensional weighting for aggregated deep convolutional features. In: European conference on computer vision, pp 685–701. Springer, Berlin
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer, Berlin
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Noh H, Araujo A, Sim J, Han B (2016) Image retrieval with deep local features and attention-based keypoints. arXiv:1612.06321
Panda J, Brown MS, Jawahar CV (2013) Offline mobile instance retrieval with a small memory footprint, pp 1257–1264
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: IEEE conference on computer vision and pattern recognition, 2007, pp 1–8
Radenović F, Tolias G, Chum O (2016) Cnn image retrieval learns from bow: Unsupervised fine-tuning with hard examples. In: European conference on computer vision, pp 3–20. Springer, Berlin
Razavian AS, Sullivan J, Carlsson S, Maki A (2014) Visual instance retrieval with deep convolutional networks. arXiv:1412.6574
Redmon J, Farhadi A (2016)
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sang J, Xu C, Liu J (2012) User-aware image tag refinement via ternary semantic analysis. IEEE Transactions on Multimedia 14(3):883–895
Article Google Scholar
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Sharma G, Schiele B (2015)
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Sivic J, Zisserman A, et al (2003) Video google: a text retrieval approach to object matching in videos. In: Iccv, vol 2, pp 1470–1477
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tolias G, Sicre R, Jégou H (2015) Particular object retrieval with integral max-pooling of cnn activations. arXiv:1511.05879
Wang S, Jiang S (2015) Instre: a new benchmark for instance-level object retrieval and recognition. ACM Transactions on Multimedia Computing Communications, and Applications (TOMM) 11(3):37
Google Scholar
Xie Y, Jiang S, Huang Q (2013) Weighted visual vocabulary to balance the descriptive ability on general dataset. Neurocomputing 119:478–488
Article Google Scholar
Zheng L, Yang Y, Tian Q (2016) Sift meets cnn: a decade survey of instance retrieval. arXiv:1608.01807
Zhou W, Lu Y, Li H, Song Y, Tian Q (2010) Spatial coding for large scale partial-duplicate web image search. In: Proceedings of the 18th ACM international conference on Multimedia, pp 511–520. ACM
Zhou W, Li H, Lu Y, Tian Q (2013) Sift match verification by geometric coding for large-scale partial-duplicate web image search. ACM Transactions on Multimedia Computing Communications, and Applications (TOMM) 9(1):4
Google Scholar
Zisserman A (2014) Triangulation embedding and democratic aggregation for image search. In: Computer vision and pattern recognition, pp 3310–3317

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China (61532018,61322212, 61602437, 61672497, 61472229 and 61202152), in part by the Beijing Municipal Commission of Science and Technology (D161100001816001),in part by Beijing Natural Science Foundation (4174106), in part by the Lenovo Outstanding Young Scientists Program, in part by National Program for Special Support of Eminent Professionals and National Program for Support of Top-notch Young Professionals, and in part by China Postdoctoral Science Foundation (2016M590135, 2017T100110). This work was also supported in part by Science and Technology Development Fund of Shandong Province of China (2016ZDJS02A11 and ZR2017MF027), the Taishan Scholar Climbing Program of Shandong Province, and SDUST Research Fund (2015TDJH102).

Author information

Authors and Affiliations

College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
Shuhuan Mei & Hua Duan
Key Lab of Intelligent Information Processing, Institute of Computing Technology, CAS, Beijing, 100190, China
Shuhuan Mei, Weiqing Min & Shuqiang Jiang
University of Chinese Academy of Sciences, Beijing, 100049, China
Shuqiang Jiang

Authors

Shuhuan Mei
View author publications
You can also search for this author in PubMed Google Scholar
Weiqing Min
View author publications
You can also search for this author in PubMed Google Scholar
Hua Duan
View author publications
You can also search for this author in PubMed Google Scholar
Shuqiang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Duan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mei, S., Min, W., Duan, H. et al. Instance-level object retrieval via deep region CNN. Multimed Tools Appl 78, 13247–13261 (2019). https://doi.org/10.1007/s11042-018-6427-1

Download citation

Received: 31 July 2017
Revised: 06 February 2018
Accepted: 20 July 2018
Published: 13 September 2018
Issue Date: 30 May 2019
DOI: https://doi.org/10.1007/s11042-018-6427-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Instance-level object retrieval via deep region CNN

Abstract

Access this article

Similar content being viewed by others

Deep Encoding Features for Instance Retrieval

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instance-level object retrieval via deep region CNN

Abstract

Access this article

Similar content being viewed by others

Deep Encoding Features for Instance Retrieval

FRWCAE: joint faster-RCNN and Wasserstein convolutional auto-encoder for instance retrieval

Scalable Bag of Selected Deep Features for Visual Instance Retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation