Polysemious visual representation based on feature aggregation for large scale image applications

Song, Xinghang; Jiang, Shuqiang; Wang, Shuhui; Li, Liang; Huang, Qingming

doi:10.1007/s11042-014-1975-5

Polysemious visual representation based on feature aggregation for large scale image applications

Published: 25 May 2014

Volume 74, pages 595–611, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xinghang Song¹,
Shuqiang Jiang¹,
Shuhui Wang¹,
Liang Li² &
…
Qingming Huang²

286 Accesses
1 Citation
Explore all metrics

Abstract

Multiple image features and multiple semantic concepts from the images have intrinsic and complex relations. These relations influence the effectiveness of image semantic analysis methods, especially on the large scale problems. In this paper, a framework of generating polysemious image representation through three levels of feature aggregation is proposed. In the codebook level aggregation, visual dictionaries are learned for each feature type, and each image feature can be reconstructed with this dictionary. In the semantic level aggregation, the multiple concept distributions are learned with each feature codebook by using the improved local anchor embedding. Then the polysemious representation for for single feature type can be established after this level. In the multiple feature level aggregation, final image polysemious representation is obtained through multiple feature fusion with a weighted pooling approach. Through the proposed framework, multiple feature fusion and multiple semantic descriptions are both achieved in an integrated way. Experimental evaluations on large scale image dataset validate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Article Open access 06 February 2017

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Article 15 September 2023

Learning to Prompt for Vision-Language Models

Article 31 July 2022

References

Binder A, Mller KR, Kawanabe M (2011) On taxonomies for multi-class image categorization. IJCV, pp 1–21
Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In: NIPS
Bosch A, Zisserman A, Muoz X (2006) Scene classification via plsa, vol 4, pp 517–530
Cao L, Ji R, Gao Y, Yang Y, Tian Q (2012) Weakly supervised sparse coding with geometric consistency pooling. In: CVPR
Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27. software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Article Google Scholar
Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: CVPR
Farhadi A, Endres I, Hoiem D, Forsyth DA (2009) Describing objects by their attributes. In: CVPR, pp 1778–1785
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR
Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: CVPR
Feng J, Yuan X, Wang Z, Xu H, Yan S (2012) Auto-grouped sparse representation for visual analysis. In: ECCV
Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV
Hwang S J, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: CVPR
Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: ECCV
Li L, Jiang S, Huang Q (2012) Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans Multimed 14(5):1401–1413
Li L, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPS
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088
Article Google Scholar
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recog 42:218–228
Article MATH Google Scholar
Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: ICML
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11(1):19–60
MATH MathSciNet Google Scholar
Muja M (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPR
Ojala T, Pietikäinen M, Maenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Article MATH Google Scholar
Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: NIPS
Parikh D, Grauman K (2011) Interactively building a discriminative vocabulary of nameable attributes. In: CVPR
Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917
Article Google Scholar
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Siddiquie B, Feris RS (2011) Image ranking and retrieval based on multi-attribute queries. In: CVPR
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans Intell Syst Technol 2(2)
Tang J, Zha ZJ, Tao D, Chua TS (2012) Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans Image Process 21(4):2354–2360
Article MathSciNet Google Scholar
Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: ECCV
Vailaya A, Figueiredo A, Jain A, Zhang H (2001) Image classification for content-based indexing. IEEE Trans Image Process 10:117–129
Article MATH Google Scholar
Wang H, Ding C, Huang H (2010) Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAI
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2011) Locality-constrained linear coding for image classification. In: CVPR
Wang S, Jiang S, Huang Q, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: CVPR
Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern
Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199
Article Google Scholar
Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28
Article Google Scholar
Wu F, Han Y, Tian Q, Zhuang Y (2010) Multi-label boosting for image annotation by structural grouping sparsity. In: ACM Multimedia
Xu H, Wang J, Hua X, Li S (2009) Tag refinement by regularized lda. In: ACM multimedia
Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann AG (2013) Feature weighting via optimal thresholding for video analysis. In: ICCV
Yang Y, Wu F, Nie F, Shen H T, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351

Download references

Acknowledgments

This work was supported in part by National Basic Research Program of China (973 Program):2012CB316400, in part by National Natural Science Foundation of China: 61322212, 61025011, 61332016 in part by the Key Technologies R&D Program of China:2012BAH18B02 and in part by in part by National Hi-Tech Development Program (863 Program) of China: 2014AA015202.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology(ICT), No.6 Kexueyuan South Road Zhongguancun, Haidian District, Beijing, China
Xinghang Song, Shuqiang Jiang & Shuhui Wang
University of Chinese Academy of Sciences, Beijing, China
Liang Li & Qingming Huang

Authors

Xinghang Song
View author publications
You can also search for this author in PubMed Google Scholar
Shuqiang Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shuhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Li
View author publications
You can also search for this author in PubMed Google Scholar
Qingming Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xinghang Song or Shuqiang Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, X., Jiang, S., Wang, S. et al. Polysemious visual representation based on feature aggregation for large scale image applications. Multimed Tools Appl 74, 595–611 (2015). https://doi.org/10.1007/s11042-014-1975-5

Download citation

Published: 25 May 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11042-014-1975-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polysemious visual representation based on feature aggregation for large scale image applications

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Learning to Prompt for Vision-Language Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Polysemious visual representation based on feature aggregation for large scale image applications

Abstract

Access this article

Similar content being viewed by others

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Learning to Prompt for Vision-Language Models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation