tutorial

Iterative Random Visual Word Selection

Authors:

Thierry Urruty,

Syntyche Gbèhounou,

Christine FernandezAuthors Info & Claims

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

Pages 249 - 256

https://doi.org/10.1145/2578726.2578758

Published: 01 April 2014 Publication History

Abstract

In content based image retrieval, one of the most important step is the construction of image signatures. To do so, a part of state-of-the-art approaches propose to build a visual vocabulary. In this paper, we propose a new methodology for visual vocabulary construction that obtains high retrieval results. Moreover, it is computationally inexpensive to build and needs no prior knowledge on features or dataset used.

Classically, the vocabulary is built by aggregating a certain number of features in centroids using a clustering algorithm. The final centroids are assimilated to visual "words". Our approach for building a visual vocabulary is based on an iterative random visual word selection mixing a saliency map and tf-idf scheme. Experiment results show that it outperforms the original "Bag of visual words" based approach in efficiency and effectiveness.

References

[1]

H. Bay, T. Tuytelaars, and L. Gool. Surf: Speeded up robust features. In A. Leonardis, H. Bischof, and A. Pinz, editors, Computer Vision -- ECCV 2006, volume 3951 of Lecture Notes in Computer Science, pages 404--417. Springer Berlin Heidelberg, 2006.

Digital Library

[2]

G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, ECCV, pages 1--22, 2004.

[3]

I. Elsayad, J. Martinet, T. Urruty, and C. Djeraba. Toward a higher-level visual representation for content-based image retrieval. Multimedia Tools Appl., 60(2):455--482, 2012.

Digital Library

[4]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/-VOC/voc2012/workshop/index.html.

[5]

J. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor. Improving "bag-of-keypoints" image categorisation: Generative models and pdf-kernels. PASCAL Eprint Series, 2005.

[6]

K. Gao, S. Lin, Y. Zhang, S. Tang, and H. Ren. Attention model based sift keypoints filtration for image retrieval. In R. Y. Lee, editor, ACIS-ICIS, pages 191--196. IEEE Computer Society, 2008.

Digital Library

[7]

M. Halvey, P. Punitha, D. Hannah, R. Villa, F. Hopfgartner, A. Goyal, and J. M. Jose. Diversity, assortment, dissimilarity, variety: A study of diversity measures using low level features for video retrieval. In European Conference on Information Retrieval, pages 126--137, 2009.

Digital Library

[8]

M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.

Digital Library

[9]

L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1254--1259, Nov. 1998.

Digital Library

[10]

H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In 23rd IEEE Conference on Computer Vision & Pattern Recognition (CVPR '10), pages 3304--3311, San Francisco, United States, 2010. IEEE Computer Society.

[11]

Y. Ke and R. Sukthankar. Pca-sift: a more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II--506--II--513 Vol.2, 2004.

Digital Library

[12]

Y. Lei, X. Gui, and Z. Shi. Feature description and image retrieval based on visual attention model. Journal of Multimedia, 6(1):56--65, 2011.

[13]

D. G. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, 2:1150--1157, 1999.

Digital Library

[14]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91--110, 2004.

Digital Library

[15]

J. Martinet. Human-centered region selection and weighting for image retrieval. In S. Battiato and J. Braz, editors, VISAPP (1), pages 729--734. SciTePress, 2013.

[16]

J. M. Martínez. Mpeg-7 overview. www.chiariglione.org/mpeg/standards/mpeg-7, 2003.

[17]

O. L. Meur, P. L. Callet, D. Barba, and D. Thoreau. A coherent computational approach to model the bottom -- up visual attention. IEEE Transactions on Pattern Analysis and MLachine Intelligence (PAMI), 28(5):802--817, 2006.

Digital Library

[18]

K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(10):1615--1630, 2005.

Digital Library

[19]

F. Mindru, T. Tuytelaars, L. V. Gool, and T. Moons. Moment invariants for recognition under changing viewpoint and illumination. Computer Vision and Image Understanding, 94(1âĂŞ3):3--27, 2004.

Digital Library

[20]

D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161--2168, June 2006.

Digital Library

[21]

L. Parsons, E. Haque, and H. Liu. Subspace clustering for high dimensional data: a review. In ACM SIGKDD, volume 6, pages 90--105. Explorations Newsletter, 2004.

Digital Library

[22]

F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In In ECCV, pages 464--475, 2006.

Digital Library

[23]

F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE Computer Society, 2007.

[24]

J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, pages 1470--1477, Oct. 2003.

Digital Library

[25]

K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1582--1596, 2010.

Digital Library

[26]

C. J. van Rijsbergen. Information Retrieval. Butterworth, 1979.

Digital Library

[27]

Z. Zdziarski and R. Dahyot. Feature selection using visual saliency for content-based image retrieval. In Signals and Systems Conference (ISSC 2012), IET Irish, pages 1--6, 2012.

[28]

L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell. Sun: A bayesian framework for saliency using natural statistics. J Vis, 8(7):32.1--20, 2008.

[29]

S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing, 20(9):2664--2677, 2011.

Digital Library

Cited By

Andono PSupriyanto CNugroho S(2018)Image compression based on SVD for BoVW model in fingerprint classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-1736334:4(2513-2519)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/JIFS-17363
Gbehounou SUrruty TLecellier FFernandez-Maloigne C(2017)Introducing Image Saliency Information into Content Based Indexing and Emotional Impact AnalysisVisual Content Indexing and Retrieval with Psycho-Visual Models10.1007/978-3-319-57687-9_4(75-101)Online publication date: 16-Oct-2017
https://doi.org/10.1007/978-3-319-57687-9_4
Le HUrruty TGbèhounou SLecellier FMartinet JFernandez-Maloigne C(2016)Improving retrieval framework using information gain modelsSignal, Image and Video Processing10.1007/s11760-016-0938-x11:2(309-316)Online publication date: 21-Jul-2016
https://doi.org/10.1007/s11760-016-0938-x

Index Terms

Iterative Random Visual Word Selection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
      1. Image processing
2. Information systems

Recommendations

Information Gain Study for Visual Vocabulary Construction
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Content Based Image Retrieval (CBIR) systems retrieve the most similar images to a query image in a collection. One of the most popular models and widely applied in this task is the Bag of Visual Words model (BoVW). In this paper, we introduce an ...
Text to Region: Visual-Word Guided Saliency Detection
Advances in Multimedia Information Processing – PCM 2018
Abstract
Image/video captioning based on neural network can generate accurate description. But how to convert visual information into natural language representation is a true enigma. Existing caption-guided saliency methods take the entire sentence as ...
Distances and weighting schemes for bag of visual words image retrieval
MIR '10: Proceedings of the international conference on Multimedia information retrieval

Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMR '14: Proceedings of International Conference on Multimedia Retrieval

April 2014

564 pages

ISBN:9781450327824

DOI:10.1145/2578726

Conference Chairs:
Mohan Kankanhalli
National University of Singapore
,
Stefan Rueger
The Open University, UK
,
R. Manmatha
A9.com, USA
,
General Chairs:
Joemon Jose
University of Glasgow, UK
,
Keith van Rijsbergen
University of Glasgow, UK

Copyright © 2014 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

ICMR '14

ICMR '14: International Conference on Multimedia Retrieval

April 1 - 4, 2014

Glasgow, United Kingdom

Acceptance Rates

ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Andono PSupriyanto CNugroho S(2018)Image compression based on SVD for BoVW model in fingerprint classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-1736334:4(2513-2519)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/JIFS-17363
Gbehounou SUrruty TLecellier FFernandez-Maloigne C(2017)Introducing Image Saliency Information into Content Based Indexing and Emotional Impact AnalysisVisual Content Indexing and Retrieval with Psycho-Visual Models10.1007/978-3-319-57687-9_4(75-101)Online publication date: 16-Oct-2017
https://doi.org/10.1007/978-3-319-57687-9_4
Le HUrruty TGbèhounou SLecellier FMartinet JFernandez-Maloigne C(2016)Improving retrieval framework using information gain modelsSignal, Image and Video Processing10.1007/s11760-016-0938-x11:2(309-316)Online publication date: 21-Jul-2016
https://doi.org/10.1007/s11760-016-0938-x

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten