skip to main content
10.1145/2072298.2072020acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Refining local descriptors by embedding semantic information for visual categorization

Published: 28 November 2011 Publication History

Abstract

Local descriptor extraction and vector quantization are the important components of widely-used Bag-of-Features (BoF) model for visual categorization. This paper proposes a simple and efficient approach to refine the local descriptors for vector quantization by embedding semantic information. The original local descriptors are integrated by a sequence of category-independent and category-dependent basis. Particularly, the category-dependent basis is learned by minimizing the joint loss minimization over local descriptors from different categories with a shared regularization penalty, which can be formulated as a linear programming problem. The transferred descriptors are further quantized and aggregated to the visual vocabulary. Experiments are performed on PASCAL VOC 2007 benchmark and the quantitative comparisons with several state-of-the-art approaches demonstrate the effectiveness of our proposed approach.

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51:117--122, 2008.
[2]
A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems, 2006.
[3]
E. Bingham and H. Mannila. Random projection in dimensionality reduction: Applications to image and text data. In Annual ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2001.
[4]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
[5]
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1--22, 2004.
[6]
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Conference on Computer Vision and Pattern Recognition, 2005.
[7]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
[8]
Y. Ke and R. Sukthankar. PCA-SIFT: a more distinctive representation for local image descriptors. In Conference on Computer Vision and Pattern Recognition, volume 2, pages 506--513, 2004.
[9]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.
[10]
F. Moosmann, B. Triggs, and F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. In Advances in Neural Information Processing Systems, pages 985--992, 2006.
[11]
Y. Mu, J. Sun, T. X. Han, L.-F. Cheong, and S. Yan. Randomized locality sensitive vocabularies for bag-of-features model. In ECCV, 2010.
[12]
D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Conference on Computer Vision and Pattern Recognition, pages 2161--2168, 2006.
[13]
F. Perronnin, C. Dance, G. Csurka, and M. Bressan. Adapted vocabularies for generic visual categorization. In ECCV, 2006.
[14]
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Conference on Computer Vision and Pattern Recognition, 2007.
[15]
J. Shotton, M. Johnson, and R. Cipolla. Semantic texton forests for image categorization and segmentation. In Conference on Computer Vision and Pattern Recognition, 2008.
[16]
J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV, pages 1470--1477, 2003.
[17]
J. Uijlings, A. Smeulders, and R. Scha. Real-time bag of words, approximately. In ACM International Conference on Image and Video Retrieval, 2009.
[18]
L. Yang, R. Jin, R. Sukthankar, and F. Jurie. Unifying discriminative visual codebook generation with classifier training for object category recognition. In Conference on Computer Vision and Pattern Recognition, 2008.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '11: Proceedings of the 19th ACM international conference on Multimedia
November 2011
944 pages
ISBN:9781450306164
DOI:10.1145/2072298
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. local descriptor
  2. semantic information
  3. visual categorization

Qualifiers

  • Short-paper

Conference

MM '11
Sponsor:
MM '11: ACM Multimedia Conference
November 28 - December 1, 2011
Arizona, Scottsdale, USA

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 117
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media