Abstract
Multiple image features and multiple semantic concepts from the images have intrinsic and complex relations. These relations influence the effectiveness of image semantic analysis methods, especially on the large scale problems. In this paper, a framework of generating polysemious image representation through three levels of feature aggregation is proposed. In the codebook level aggregation, visual dictionaries are learned for each feature type, and each image feature can be reconstructed with this dictionary. In the semantic level aggregation, the multiple concept distributions are learned with each feature codebook by using the improved local anchor embedding. Then the polysemious representation for for single feature type can be established after this level. In the multiple feature level aggregation, final image polysemious representation is obtained through multiple feature fusion with a weighted pooling approach. Through the proposed framework, multiple feature fusion and multiple semantic descriptions are both achieved in an integrated way. Experimental evaluations on large scale image dataset validate the effectiveness of the proposed method.
Similar content being viewed by others
References
Binder A, Mller KR, Kawanabe M (2011) On taxonomies for multi-class image categorization. IJCV, pp 1–21
Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In: NIPS
Bosch A, Zisserman A, Muoz X (2006) Scene classification via plsa, vol 4, pp 517–530
Cao L, Ji R, Gao Y, Yang Y, Tian Q (2012) Weakly supervised sparse coding with geometric consistency pooling. In: CVPR
Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27. software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: CVPR
Farhadi A, Endres I, Hoiem D, Forsyth DA (2009) Describing objects by their attributes. In: CVPR, pp 1778–1785
Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR
Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: CVPR
Feng J, Yuan X, Wang Z, Xu H, Yan S (2012) Auto-grouped sparse representation for visual analysis. In: ECCV
Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV
Hwang S J, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: CVPR
Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: ECCV
Li L, Jiang S, Huang Q (2012) Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans Multimed 14(5):1401–1413
Li L, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPS
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recog 42:218–228
Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: ICML
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11(1):19–60
Muja M (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP
Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPR
Ojala T, Pietikäinen M, Maenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: NIPS
Parikh D, Grauman K (2011) Interactively building a discriminative vocabulary of nameable attributes. In: CVPR
Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Siddiquie B, Feris RS (2011) Image ranking and retrieval based on multi-attribute queries. In: CVPR
Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans Intell Syst Technol 2(2)
Tang J, Zha ZJ, Tao D, Chua TS (2012) Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans Image Process 21(4):2354–2360
Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: ECCV
Vailaya A, Figueiredo A, Jain A, Zhang H (2001) Image classification for content-based indexing. IEEE Trans Image Process 10:117–129
Wang H, Ding C, Huang H (2010) Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAI
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2011) Locality-constrained linear coding for image classification. In: CVPR
Wang S, Jiang S, Huang Q, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: CVPR
Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern
Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199
Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28
Wu F, Han Y, Tian Q, Zhuang Y (2010) Multi-label boosting for image annotation by structural grouping sparsity. In: ACM Multimedia
Xu H, Wang J, Hua X, Li S (2009) Tag refinement by regularized lda. In: ACM multimedia
Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann AG (2013) Feature weighting via optimal thresholding for video analysis. In: ICCV
Yang Y, Wu F, Nie F, Shen H T, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351
Acknowledgments
This work was supported in part by National Basic Research Program of China (973 Program):2012CB316400, in part by National Natural Science Foundation of China: 61322212, 61025011, 61332016 in part by the Key Technologies R&D Program of China:2012BAH18B02 and in part by in part by National Hi-Tech Development Program (863 Program) of China: 2014AA015202.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Song, X., Jiang, S., Wang, S. et al. Polysemious visual representation based on feature aggregation for large scale image applications. Multimed Tools Appl 74, 595–611 (2015). https://doi.org/10.1007/s11042-014-1975-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-1975-5