skip to main content
research-article

INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition

Published: 05 February 2015 Publication History

Abstract

Over the last several decades, researches on visual object retrieval and recognition have achieved fast and remarkable success. However, while the category-level tasks prevail in the community, the instance-level tasks (especially recognition) have not yet received adequate focuses. Applications such as content-based search engine and robot vision systems have alerted the awareness to bring instance-level tasks into a more realistic and challenging scenario. Motivated by the limited scope of existing instance-level datasets, in this article we propose a new benchmark for INSTance-level visual object REtrieval and REcognition (INSTRE). Compared with existing datasets, INSTRE has the following major properties: (1) balanced data scale, (2) more diverse intraclass instance variations, (3) cluttered and less contextual backgrounds, (4) object localization annotation for each image, (5) well-manipulated double-labelled images for measuring multiple object (within one image) case. We will quantify and visualize the merits of INSTRE data, and extensively compare them against existing datasets. Then on INSTRE, we comprehensively evaluate several popular algorithms to large-scale object retrieval problem with multiple evaluation metrics. Experimental results show that all the methods suffer a performance drop on INSTRE, proving that this field still remains a challenging problem. Finally we integrate these algorithms into a simple yet efficient scheme for recognition and compare it with classification-based methods. Importantly, we introduce the realistic multiobjects recognition problem. All experiments are conducted in both single object case and multiple objects case.

References

[1]
P. F. Alcantarilla, J. Nuevo, and A. Bartoli. 2013. Fast explicit diffusion for accelerated features in nonlinear scale spaces. In Proceedings of the British Machine Vision Conference.
[2]
R. Arandjelović and A. Zisserman. 2011. Smooth object retrieval using a bag of boundaries. In Proceedings of the IEEE International Conference on Computer Vision.
[3]
Y. Avrithis, G. Tolias, and Y. Kalantidis. 2010. Feature map hashing: Sub-linear indexing of appearance and global geometry. In Proceedings of the ACM Multimedia Conference. ACM.
[4]
H. Bay, T. Tuytelaars, and L. Van Gool. 2006. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision. Springer, 404--417.
[5]
Liefeng Bo, Xiaofeng Ren, and Dieter Fox. 2010. Kernel descriptors for visual recognition. In Neural Information Processing Systems 1, 3.
[6]
L. Bo and C. Sminchisescu. 2009. Efficient match kernel between sets of features for visual recognition. In Neural Information Processing Systems, 1730--1731.
[7]
L. Chu, S. Jiang, S. Wang, Y. Zhang, and Q. Huang. 2013. Robust spatial consistency graph model for partial duplicate image retrieval. IEEE Trans. Multimedia.
[8]
O. Chum, A. Mikulik, M. Perdoch, and J. Matas. 2011. Total recall II: Query expansion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 889--896.
[9]
J. Deng, A. C Berg, K. Li, and F.-F. Li. 2010. What does classifying more than 10,000 image categories tell us? In Proceedings of the European Conference on Computer Vision. Springer, 71--84.
[10]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.
[11]
M. A Fischler and R. C Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381--395.
[12]
T. Gao and D. Koller. 2011. Discriminative learning of relaxed hierarchy for large-scale visual recognition. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2072--2079.
[13]
J.-M. Geusebroek, G. J Burghouts, and A. W. M. Smeulders. 2005. The Amsterdam library of object images. Int. J. Comput. Vision 61, 1, 103--112.
[14]
M. Jain, H. Jégou, and P. Gros. 2011. Asymmetric hamming embedding: taking the best of our bits for large scale image search. In Proceedings of the ACM Multimedia Conference. ACM, 1441--1444.
[15]
C. V. Jawahar, A. Zisserman, A. Vedaldi, and O. M. Parkhi. 2012. Cats and dogs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[16]
H. Jégou, M. Douze, and C. Schmid. 2008. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the European Conference on Computer Vision. Springer, 304--317.
[17]
H. Jégou, M. Douze, and C. Schmid. 2009. On the burstiness of visual elements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1169--1176.
[18]
Y. Jiang, J. Meng, and J. Yuan. 2012. Randomized visual phrases for object search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3100--3107.
[19]
A. Joly and O. Buisson. 2009. Logo retrieval with a contrario visual query expansion. In Proceedings of the ACM Multimedia Conference. ACM, 581--584.
[20]
Y. Kalantidis, L. G. Pueyo, M. Trevisiol, R. van Zwol, and Y. Avrithis. 2011. Scalable triangulation-based logo recognition. In Proceedings of the ACM International Conference on Multimedia Retrieval.
[21]
D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2, 91--110.
[22]
S. A. Nene, S. K. Nayar, and H. Murase. 1996. Columbia object image library (COIL-20). Tech. Rep. CUCS-005-96.
[23]
D. Nistér and H. Stewénius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 2161--2168.
[24]
V. Ordonez, J. Deng, Y. Choi, A. C. Berg, and T. L. Berg. 2013. From large scale image categorization to entry-level categories. In Proceedings of the IEEE International Conference on Computer Vision.
[25]
F. Perronnin, J. Sánchez, and T. Mensink. 2010. Improving the Fisher kernel for large-scale image classification. In Proceedings of the European Conference on Computer Vision. Springer, 143--156.
[26]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2007. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[27]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[28]
N. Pinto, D. D. Cox, and J. J. DiCarlo. 2008. Why is real-world visual object recognition hard? PLoS Computa. Biol. 4, 1, e27.
[29]
J. Ponce, T. L. Berg, M. Everingham, et al. 2006. Dataset Issues in object recognition. In Toward Category-Level Object Recognition, Springer, 29--48.
[30]
S. Romberg, L. G. Pueyo, R. Lienhart, and R. van Zwol. 2011. Scalable logo recognition in real-world images. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 25:1--25:8.
[31]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. 2011. ORB: an efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2564--2571.
[32]
B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. 2008. LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vision 77, 1--3, 157--173.
[33]
X. Shen, Z. Lin, J. Brandt, S. Avidan, and Y. Wu. 2012. Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3013--3020.
[34]
J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 1470--1477.
[35]
A. Torralba and A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1521--1528.
[36]
J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J.-M. Geusebroek. 2010. Visual word ambiguity. IEEE Trans. Pattern Anal. Mach. Intell. 32, 7, 1271--1283.
[37]
A. Vedaldi and B. Fulkerson. 2008. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.
[38]
S. Wang, Y. Xue, L. Chu, Y. Jiang, and S. Jiang. 2013. ObjectSense: A scalable multi-objects recognition system based on partial-duplicate image retrieval. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 317--318.
[39]
Z. Wu, Q. Xu, S. Jiang, Q. Huang, P. Cui, and L. Li. 2010. Adding affine invariant geometric constraint for partial-duplicate image retrieval. In Proceedings of the International Conference on Pattern Recognition. IEEE, 842--845.
[40]
B. Yao, Ad. Khosla, and Li F.-F. 2011. Combining randomization and discrimination for fine-grained image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1577--1584.
[41]
W. Zhou, H. Li, Y. Lu, and Q. Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate web image search. ACM Trans. Multimedia Comput. Commun. Appl. 9, 1, 4.
[42]
W. Zhou, Y. Lu, H. Li, Y. Song, and Q. Tian. 2010. Spatial coding for large scale partial-duplicate web image search. In Proceedings of the ACM Multimedia Conference. ACM, 511--520.

Cited By

View all
  • (2024)Multi-modal Deep Neural Network application on feature extraction and content-based image RetrievalProceedings of the 2024 International Conference on Advanced Robotics, Automation Engineering and Machine Learning10.1145/3677454.3677460(28-34)Online publication date: 28-Jun-2024
  • (2024)SODRet: Instance retrieval using salient object detection for self-service shoppingMachine Learning with Applications10.1016/j.mlwa.2023.10052315(100523)Online publication date: Mar-2024
  • (2024)Unsupervised affinity learning based on manifold analysis for image retrieval: A surveyComputer Science Review10.1016/j.cosrev.2024.10065753(100657)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 11, Issue 3
January 2015
173 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2733235
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 February 2015
Accepted: 01 October 2014
Revised: 01 August 2014
Received: 01 March 2014
Published in TOMM Volume 11, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Instance-level
  2. annotation
  3. dataset
  4. evaluation
  5. multiple object
  6. object recognition
  7. object retrieval

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Basic Research Program of China (973 Program)
  • National Natural Science Foundation of China
  • Key Technologies R&D Program of China
  • National Hi-Tech Development Program (863 Program) of China
  • Lenovo Outstanding Young Scientists Program (LOYS)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-modal Deep Neural Network application on feature extraction and content-based image RetrievalProceedings of the 2024 International Conference on Advanced Robotics, Automation Engineering and Machine Learning10.1145/3677454.3677460(28-34)Online publication date: 28-Jun-2024
  • (2024)SODRet: Instance retrieval using salient object detection for self-service shoppingMachine Learning with Applications10.1016/j.mlwa.2023.10052315(100523)Online publication date: Mar-2024
  • (2024)Unsupervised affinity learning based on manifold analysis for image retrieval: A surveyComputer Science Review10.1016/j.cosrev.2024.10065753(100657)Online publication date: Aug-2024
  • (2024)Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding AlignmentSimilarity Search and Applications10.1007/978-3-031-75823-2_9(97-110)Online publication date: 25-Oct-2024
  • (2023)Improving Image Encoders for General-Purpose Nearest Neighbor Search and ClassificationProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592266(57-66)Online publication date: 12-Jun-2023
  • (2023)Deep Learning for Instance Retrieval: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.321859145:6(7270-7292)Online publication date: 1-Jun-2023
  • (2023)Dataset-Driven Unsupervised Object Discovery for Region-Based Instance Image RetrievalIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314143345:1(247-263)Online publication date: 1-Jan-2023
  • (2023)Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations2023 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV51070.2023.01037(11256-11267)Online publication date: 1-Oct-2023
  • (2023)Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52729.2023.00586(6048-6058)Online publication date: Jun-2023
  • (2023)Universal Image Embedding: Retaining and Expanding Knowledge With Multi-Domain Fine-TuningIEEE Access10.1109/ACCESS.2023.326780411(38208-38217)Online publication date: 2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media