skip to main content
10.1145/2578726.2578758acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
tutorial

Iterative Random Visual Word Selection

Published: 01 April 2014 Publication History

Abstract

In content based image retrieval, one of the most important step is the construction of image signatures. To do so, a part of state-of-the-art approaches propose to build a visual vocabulary. In this paper, we propose a new methodology for visual vocabulary construction that obtains high retrieval results. Moreover, it is computationally inexpensive to build and needs no prior knowledge on features or dataset used.
Classically, the vocabulary is built by aggregating a certain number of features in centroids using a clustering algorithm. The final centroids are assimilated to visual "words". Our approach for building a visual vocabulary is based on an iterative random visual word selection mixing a saliency map and tf-idf scheme. Experiment results show that it outperforms the original "Bag of visual words" based approach in efficiency and effectiveness.

References

[1]
H. Bay, T. Tuytelaars, and L. Gool. Surf: Speeded up robust features. In A. Leonardis, H. Bischof, and A. Pinz, editors, Computer Vision -- ECCV 2006, volume 3951 of Lecture Notes in Computer Science, pages 404--417. Springer Berlin Heidelberg, 2006.
[2]
G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, ECCV, pages 1--22, 2004.
[3]
I. Elsayad, J. Martinet, T. Urruty, and C. Djeraba. Toward a higher-level visual representation for content-based image retrieval. Multimedia Tools Appl., 60(2):455--482, 2012.
[4]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/-VOC/voc2012/workshop/index.html.
[5]
J. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor. Improving "bag-of-keypoints" image categorisation: Generative models and pdf-kernels. PASCAL Eprint Series, 2005.
[6]
K. Gao, S. Lin, Y. Zhang, S. Tang, and H. Ren. Attention model based sift keypoints filtration for image retrieval. In R. Y. Lee, editor, ACIS-ICIS, pages 191--196. IEEE Computer Society, 2008.
[7]
M. Halvey, P. Punitha, D. Hannah, R. Villa, F. Hopfgartner, A. Goyal, and J. M. Jose. Diversity, assortment, dissimilarity, variety: A study of diversity measures using low level features for video retrieval. In European Conference on Information Retrieval, pages 126--137, 2009.
[8]
M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.
[9]
L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1254--1259, Nov. 1998.
[10]
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In 23rd IEEE Conference on Computer Vision & Pattern Recognition (CVPR '10), pages 3304--3311, San Francisco, United States, 2010. IEEE Computer Society.
[11]
Y. Ke and R. Sukthankar. Pca-sift: a more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II--506--II--513 Vol.2, 2004.
[12]
Y. Lei, X. Gui, and Z. Shi. Feature description and image retrieval based on visual attention model. Journal of Multimedia, 6(1):56--65, 2011.
[13]
D. G. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, 2:1150--1157, 1999.
[14]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91--110, 2004.
[15]
J. Martinet. Human-centered region selection and weighting for image retrieval. In S. Battiato and J. Braz, editors, VISAPP (1), pages 729--734. SciTePress, 2013.
[16]
J. M. Martínez. Mpeg-7 overview. www.chiariglione.org/mpeg/standards/mpeg-7, 2003.
[17]
O. L. Meur, P. L. Callet, D. Barba, and D. Thoreau. A coherent computational approach to model the bottom -- up visual attention. IEEE Transactions on Pattern Analysis and MLachine Intelligence (PAMI), 28(5):802--817, 2006.
[18]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(10):1615--1630, 2005.
[19]
F. Mindru, T. Tuytelaars, L. V. Gool, and T. Moons. Moment invariants for recognition under changing viewpoint and illumination. Computer Vision and Image Understanding, 94(1âĂŞ3):3--27, 2004.
[20]
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161--2168, June 2006.
[21]
L. Parsons, E. Haque, and H. Liu. Subspace clustering for high dimensional data: a review. In ACM SIGKDD, volume 6, pages 90--105. Explorations Newsletter, 2004.
[22]
F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In In ECCV, pages 464--475, 2006.
[23]
F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 18-23 June 2007, Minneapolis, Minnesota, USA. IEEE Computer Society, 2007.
[24]
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, pages 1470--1477, Oct. 2003.
[25]
K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1582--1596, 2010.
[26]
C. J. van Rijsbergen. Information Retrieval. Butterworth, 1979.
[27]
Z. Zdziarski and R. Dahyot. Feature selection using visual saliency for content-based image retrieval. In Signals and Systems Conference (ISSC 2012), IET Irish, pages 1--6, 2012.
[28]
L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell. Sun: A bayesian framework for saliency using natural statistics. J Vis, 8(7):32.1--20, 2008.
[29]
S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing, 20(9):2664--2677, 2011.

Cited By

View all
  • (2018)Image compression based on SVD for BoVW model in fingerprint classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-1736334:4(2513-2519)Online publication date: 1-Jan-2018
  • (2017)Introducing Image Saliency Information into Content Based Indexing and Emotional Impact AnalysisVisual Content Indexing and Retrieval with Psycho-Visual Models10.1007/978-3-319-57687-9_4(75-101)Online publication date: 16-Oct-2017
  • (2016)Improving retrieval framework using information gain modelsSignal, Image and Video Processing10.1007/s11760-016-0938-x11:2(309-316)Online publication date: 21-Jul-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMR '14: Proceedings of International Conference on Multimedia Retrieval
April 2014
564 pages
ISBN:9781450327824
DOI:10.1145/2578726
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2014

Check for updates

Author Tags

  1. Bags of Visual Words
  2. Images Retrieval
  3. Random Word Selection
  4. Saliency Map
  5. Vocabulary Construction

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

ICMR '14
ICMR '14: International Conference on Multimedia Retrieval
April 1 - 4, 2014
Glasgow, United Kingdom

Acceptance Rates

ICMR '14 Paper Acceptance Rate 21 of 111 submissions, 19%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Image compression based on SVD for BoVW model in fingerprint classificationJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-1736334:4(2513-2519)Online publication date: 1-Jan-2018
  • (2017)Introducing Image Saliency Information into Content Based Indexing and Emotional Impact AnalysisVisual Content Indexing and Retrieval with Psycho-Visual Models10.1007/978-3-319-57687-9_4(75-101)Online publication date: 16-Oct-2017
  • (2016)Improving retrieval framework using information gain modelsSignal, Image and Video Processing10.1007/s11760-016-0938-x11:2(309-316)Online publication date: 21-Jul-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media