skip to main content
10.1145/1743384.1743438acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Distances and weighting schemes for bag of visual words image retrieval

Published: 29 March 2010 Publication History

Abstract

Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers try to benefit from text retrieval contributions to enhance their retrieval systems. In particular, Sivic and Zisserman, with their video-google framework [1], propose a description of images similar to standard text descriptors: images are described by elementary image parts, called visual words. Thus, they perform image retrieval using the standard Vector Space Model (VSM) of Information Retrieval (IR) and benefit from some classical IR techniques such as inverted files. Among available text retrieval techniques, automatically finding the most relevant words to describe a document has been intensively studied for texts, but not for images. In this paper, we propose to explore the use of term weighting techniques and classical distances from text retrieval in the case of images. These weights are standard VSM weights, weights derived from probabilistic models of IR or new weighting schemes that we propose. Our experiments, performed on several datasets, show that no weighting scheme can improve retrieval on every dataset, but also that choosing weights fitting the properties of the dataset can improve precision and MAP up to 10%. This study provides some interesting insights about the semantic and statistical differences between textual and visual words, and about the way visual word-based image retrieval systems can be optimized. It also shows some limits of the bag of visual words model, and the relation existing between Minkowski distances and local weighting. At last, this study questions some experimental habits commonly found in the literature (choice of L1 distance, TF*IDF weights and evaluation using one dataset only).

References

[1]
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proceedings of ICCV. Volume 2., Nice, France (2003) 1470--1477
[2]
Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: Proceedings of ECCV. (2006)
[3]
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J,., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. International Journal of Computer Vision 65(1-2) (2005) 43--72
[4]
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on PAMI 27(10) (2005) 1615--1630
[5]
Tirilly, P., Claveau, V., Gros, P.: A review of weighting schemes for bag of visual words image retrieval. Technical Report 1927, IRISA, Rennes, France (April 2009)
[6]
Yang, J., Jiang, Y. G., Hauptmann, A. G., Ngo, C. W.: Evaluating bag-of-visual-words representations in scene classification. In: Proceedings of the international workshop on Multimedia Information Retrieval, New York, NY, USA, ACM (2007) 197--206
[7]
Jiang, Y. G., Ngo, C. W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of CIVR, New York, NY, USA, ACM (2007) 494--501
[8]
Jiang, Y. G., Ngo, C. W.: Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval. Computer Vision and Image Understanding (2008)
[9]
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Proceedings of the international conference on Computer Vision And Pattern Recognition (CVPR). (2008)
[10]
Chen, X., Hu, X., Shen, X.: Spatial weighting for bag-of-visual-words and its application in content-based image retrieval. In: Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. (2009) 867--874
[11]
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proceedings of CVPR, Washington, DC, USA, IEEE Computer Society (2006) 2161--2168
[12]
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak spatial consistency for large scale image search. In: Proceedings of ECCV. (2008)
[13]
Jegou, H., Harzallah, H., Schmid, C.: A contextual dissimilarity measure for accurate and efficient image search. In: Proceedings of CVPR. (2007) 1--8
[14]
Aggarwal, C. C., Hinneburg, A., Keim, D. A.: On the surprising behavior of distance metrics in high dimensional space. In: Lecture Notes in Computer Science, Springer (2001) 420--434
[15]
Robertson, S.: The probability ranking principle in information retrieval. Journal of documentation 33 (1977) 294 -- 304
[16]
Jones, K. S., Walker, S., Robertson, S. E.: A probabilistic model of information retrieval: development and comparative experiments (part 2). Information Processing and Management 36(6) (2000) 809--840
[17]
Jones, K. S., Walker, S., Robertson, S. E.: A probabilistic model of information retrieval: development and comparative experiments (part 1). Information Processing and Management 36(6) (2000) 779--808
[18]
Amati, G., Van Rijsbergen, C. J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems 20(4) (2002) 357--389
[19]
Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: Proceedings of ICCV, Rio De Janeiro, Brazil (2007)
[20]
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2007)
[21]
Zheng, Q.F., Wang, W.Q., Gao, W.: Effective and efficient object-based image retrieval using visual phrases. In: Proceedings of ACM Multimedia, New York, USA, ACM (2006) 77--80
[22]
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR, Workshop on Generative-Model Based Vision. (2004)
[23]
Barros, J., French, J., Martin, W., Kelly, P., Cannon, M.: Using the triangle inequality to reduce the number of comparisons required for similarity-based retrieval. In: Proc. of SPIE/IS&T Conf. on Storage and Retrieval for Image and Video Databases IV, SPIE (1996) 392--403

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '10: Proceedings of the international conference on Multimedia information retrieval
March 2010
600 pages
ISBN:9781605588155
DOI:10.1145/1743384

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 March 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bags of visual words
  2. divergence from randomness
  3. fractional distance
  4. image retrieval
  5. minkowski distance
  6. text retrieval
  7. tf*idf
  8. weighting schemes

Qualifiers

  • Poster

Conference

MIR '10
Sponsor:
MIR '10: International Conference on Multimedia Information Retrieval
March 29 - 31, 2010
Pennsylvania, Philadelphia, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Bag of Visual Words Model - A Mathematical ApproachAdvanced Mathematical Applications in Data Science10.2174/9789815124842123010007(68-79)Online publication date: 22-Aug-2023
  • (2020)Preparing the guide robot to operation2020 13th International Conference on Developments in eSystems Engineering (DeSE)10.1109/DeSE51703.2020.9450795(146-151)Online publication date: 14-Dec-2020
  • (2020)Scale-space multi-view bag of words for scene categorizationMultimedia Tools and Applications10.1007/s11042-020-09759-9Online publication date: 7-Sep-2020
  • (2019)Novel Distributional Visual-Feature Representations for image classificationMultimedia Tools and Applications10.1007/s11042-018-6674-178:9(11313-11336)Online publication date: 1-May-2019
  • (2019)Predicting Entity Mentions in Scientific LiteratureThe Semantic Web10.1007/978-3-030-21348-0_25(379-393)Online publication date: 25-May-2019
  • (2018)Feature Similarity and Frequency-Based Weighted Visual Words Codebook Learning Scheme for Human Action RecognitionImage and Video Technology10.1007/978-3-319-75786-5_27(326-336)Online publication date: 15-Feb-2018
  • (2017)Representing word image using visual word embeddings and RNN for keyword spotting on historical document images2017 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME.2017.8019403(1368-1373)Online publication date: Jul-2017
  • (2017)Visual language model for keyword spotting on historical mongolian document images2017 29th Chinese Control And Decision Conference (CCDC)10.1109/CCDC.2017.7978797(1737-1742)Online publication date: May-2017
  • (2017)Evolving weighting schemes for the Bag of Visual WordsNeural Computing and Applications10.1007/s00521-016-2223-x28:5(925-939)Online publication date: 1-May-2017
  • (2016)Improving the BoVW via discriminative visual n-grams and MKL strategiesNeurocomputing10.1016/j.neucom.2015.10.053175:PA(768-781)Online publication date: 29-Jan-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media