Performance evaluation of large-scale object recognition system using bag-of-visual words model

Kim, Min-Uk; Yoon, Kyoungro

doi:10.1007/s11042-014-2152-6

Performance evaluation of large-scale object recognition system using bag-of-visual words model

Published: 25 June 2014

Volume 74, pages 2499–2517, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Min-Uk Kim¹ &
Kyoungro Yoon¹

294 Accesses
6 Citations
Explore all metrics

Abstract

Object recognition technology is usually used for recognizing specific objects, such as book covers, landmarks, vehicles, etc. This technology is supported by multi-dimensional local image descriptors in most situations. These descriptors are designed to be robust to the environmental changes, such as illumination change, view angle change, scale change, etc. If there are many target objects in your database, object recognition using large scale local image descriptor database may not be a trivial task, because of the high dimensionality of the local image descriptors. For consistent responses from a large-scale database with a reasonable time delay, we need to have a proper data structure which supports the indexing and querying functionality. A vocabulary tree is a data structure based on local image descriptors, and this data structure is commonly used to cope with massive databases containing local image descriptors. By using a vocabulary tree, a local image descriptor can be mapped to a vocabulary tree’s leaf node ID, constructing a visual word for object recognition. Visual words are then effectively exploited by a traditional text retrieval engine. In this study, we built a large-scale object recognition system using a vocabulary tree that had leaf nodes of 1 million Scale-Invariant Feature Transform (SIFT) descriptors, which is the most promising local image descriptor in terms of precision. We implement proposed system using publicly available software so that further enhancements and/or reproducibility would be easily accomplished. We then compared and evaluated the proposed system’s performance with the current MPEG CDVS (Compact Descriptors for Visual Search) standard using a database containing two dimensional planar object datasets of three categories with one million distracter images. In addition to these datasets, which are equivalent to those of CDVS, we add a new dataset which are made to mimic realistic occlusion and clutter effects. Experimental results show that our proposed system’s performance is comparable to that of the CDVS achieving 90 % precision at 5 s retrieval time. We also find characteristics of vocabulary tree limiting adaptation to a specific application domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast Visual Vocabulary Construction for Image Retrieval Using Skewed-Split k-d Trees

Improvement the Bag of Words Image Representation Using Spatial Information

Large Scale Image Retrieval Using Vector of Locally Aggregated Descriptors

References

Bay H, Ess A, Tuytelaars T (2008) Speeded up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Crow FC (1984) Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: Ideas, influences, and trends of the new age. ACM Comput Surv CSUR 40:5. doi:10.1145/1348246.1348248
Google Scholar
Eigen. http://eigen.tuxfamily.org.
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM. doi:10.1145/358669.358692
MATH MathSciNet Google Scholar
Flow powered by amazon. http://flow.a9.com
Francini G, SLX Y, Balestri M (2013) Selection of local features for visual search. Sig Process Image Commun 28:311–322. doi:10.1016/j.image.2012.11.002
Article Google Scholar
Fraundorfer F, Wu C, Frahm JM (2008) Visual word based location recognition in 3d models using distance augmented weighting
Heinly J, Dunn E, Frahm J-M (2012) Comparative evaluation of binary features. 759–773
Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. 527–536.
Imageresampler. https://code.google.com/p/imageresampler/
ISO/IEC 15938. Multimedia content description interface.
ISO/IEC 15938 part 13: Compact descriptors for visual search.
ISO/IEC JTC1 SC29 WG11 N11531 (2010) Compact descriptors for visual search: requirements.
Kooaba. http://www.kooaba.com
Lepsoy S, Francini G, Cordara G, de Gusmao PPB (2011) Statistical modelling of outliers for fast visual search. Multimedia and Expo 1–6
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval. ACM Trans Multimed Comput Commun Appl 2:1–19
Article Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Lucene. http://lucene.apache.org
Mikolajczyk K, Tuytelaars T, Schmid C et al (2005) A comparison of affine region detectors. Int J Comput Vis 65:43–72
Article Google Scholar
Newman M (2005) Power laws, pareto distributions and zipf’s law. Contemp Phys 46:323–351. doi:10.1080/00107510500052444
Article Google Scholar
Nister D, Stewenius H (2006) Scalable Recognition with a Vocabulary Tree. Computer Vision and Pattern Recognition 2161–2168
OpenCV. http://opencv.org
Philbin J, Chum O, Isard M, et al. (2007) Object retrieval with large vocabularies and fast spatial matching. Computer Vision and Pattern Recognition 1–8
Rui Y, Huang TS, Chang SF (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image Represent 10:39–62
Article Google Scholar
Sivic J, Zisserman A (2003) Video Google: A text retrieval approach to object matching in videos. Computer Vision, 2003 Proceedings Ninth IEEE International Conference on 1470–1477 vol. 2.
Smeulders AW, Worring M, Santini S et al (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22:1349–1380
Article Google Scholar
Tian Q, Zhang S, Zhou W et al (2010) Building descriptive and discriminative visual codebook for large-scale image applications. Multimed Tools Appl 51:441–477. doi:10.1007/s11042-010-0636-6
Article Google Scholar
Torralba A, Efros AA (2011) Unbiased look at dataset bias. In: Computer Vision and Pattern Recognition (CVPR), 2011 I.E. Conference on. pp 1521–1528
Tsai SS, Chen DM, Takacs G, et al. (2010) Fast geometric re-ranking for image-based retrieval. Image Processing (ICIP), 2010 17th IEEE International Conference on 1029–1032
Tuytelaars T, Mikolajczyk K (2007) Local invariant feature detectors: a survey. FNT Comput Graph Vision 3:177–280. doi:10.1561/0600000017
Article Google Scholar
Vedaldi A, Fulkerson B (2010) VLFeat: An open and portable library of computer vision algorithms. 1469–1472.
Yang J, Jiang Y-G, Hauptmann AG, Ngo C-W (2007) Evaluating bag-of-visual-words representations in scene classification. 197–206.
Zheng Q-F, Wang W-Q, Gao W (2006) Effective and efficient object-based image retrieval using visual phrases. 77–80

Download references

Acknowledgments

This research was supported by the Basic Science Research Program of the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (Grant no. 2012006817).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Konkuk University, Seoul, South Korea, 143-701
Min-Uk Kim & Kyoungro Yoon

Authors

Min-Uk Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyoungro Yoon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyoungro Yoon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, MU., Yoon, K. Performance evaluation of large-scale object recognition system using bag-of-visual words model. Multimed Tools Appl 74, 2499–2517 (2015). https://doi.org/10.1007/s11042-014-2152-6

Download citation

Received: 01 October 2013
Revised: 31 May 2014
Accepted: 04 June 2014
Published: 25 June 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11042-014-2152-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance evaluation of large-scale object recognition system using bag-of-visual words model

Abstract

Access this article

Similar content being viewed by others

Fast Visual Vocabulary Construction for Image Retrieval Using Skewed-Split k-d Trees

Improvement the Bag of Words Image Representation Using Spatial Information

Large Scale Image Retrieval Using Vector of Locally Aggregated Descriptors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance evaluation of large-scale object recognition system using bag-of-visual words model

Abstract

Access this article

Similar content being viewed by others

Fast Visual Vocabulary Construction for Image Retrieval Using Skewed-Split k-d Trees

Improvement the Bag of Words Image Representation Using Spatial Information

Large Scale Image Retrieval Using Vector of Locally Aggregated Descriptors

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation