Skip to main content
Log in

Polysemious visual representation based on feature aggregation for large scale image applications

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Multiple image features and multiple semantic concepts from the images have intrinsic and complex relations. These relations influence the effectiveness of image semantic analysis methods, especially on the large scale problems. In this paper, a framework of generating polysemious image representation through three levels of feature aggregation is proposed. In the codebook level aggregation, visual dictionaries are learned for each feature type, and each image feature can be reconstructed with this dictionary. In the semantic level aggregation, the multiple concept distributions are learned with each feature codebook by using the improved local anchor embedding. Then the polysemious representation for for single feature type can be established after this level. In the multiple feature level aggregation, final image polysemious representation is obtained through multiple feature fusion with a weighted pooling approach. Through the proposed framework, multiple feature fusion and multiple semantic descriptions are both achieved in an integrated way. Experimental evaluations on large scale image dataset validate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Binder A, Mller KR, Kawanabe M (2011) On taxonomies for multi-class image categorization. IJCV, pp 1–21

  2. Bo L, Ren X, Fox D (2010) Kernel descriptors for visual recognition. In: NIPS

  3. Bosch A, Zisserman A, Muoz X (2006) Scene classification via plsa, vol 4, pp 517–530

  4. Cao L, Ji R, Gao Y, Yang Y, Tian Q (2012) Weakly supervised sparse coding with geometric consistency pooling. In: CVPR

  5. Chang C, Lin C (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27. software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

    Article  Google Scholar 

  6. Dalal N, Triggs B (2005) Histogram of oriented gradient object detection. In: CVPR

  7. Farhadi A, Endres I, Hoiem D, Forsyth DA (2009) Describing objects by their attributes. In: CVPR, pp 1778–1785

  8. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: CVPR

  9. Feng J, Ni B, Tian Q, Yan S (2011) Geometric lp-norm feature pooling for image classification. In: CVPR

  10. Feng J, Yuan X, Wang Z, Xu H, Yan S (2012) Auto-grouped sparse representation for visual analysis. In: ECCV

  11. Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: ICCV

  12. Hwang S J, Sha F, Grauman K (2011) Sharing features between objects and their attributes. In: CVPR

  13. Kwitt R, Vasconcelos N, Rasiwasia N (2012) Scene recognition on the semantic manifold. In: ECCV

  14. Li L, Jiang S, Huang Q (2012) Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans Multimed 14(5):1401–1413

  15. Li L, Su H, Xing E, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: NIPS

  16. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9):1075–1088

    Article  Google Scholar 

  17. Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recog 42:218–228

    Article  MATH  Google Scholar 

  18. Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: ICML

  19. Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11(1):19–60

    MATH  MathSciNet  Google Scholar 

  20. Muja M (2009) Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP

  21. Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: CVPR

  22. Ojala T, Pietikäinen M, Maenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  Google Scholar 

  23. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175

    Article  MATH  Google Scholar 

  24. Ordonez V, Kulkarni G, Berg TL (2011) Im2text: describing images using 1 million captioned photographs. In: NIPS

  25. Parikh D, Grauman K (2011) Interactively building a discriminative vocabulary of nameable attributes. In: CVPR

  26. Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917

    Article  Google Scholar 

  27. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  28. Siddiquie B, Feris RS (2011) Image ranking and retrieval based on multi-attribute queries. In: CVPR

  29. Tang J, Hong R, Yan S, Chua TS, Qi GJ, Jain R (2011) Image annotation by knn-sparse graph-based label propagation over noisily-tagged web images. ACM Trans Intell Syst Technol 2(2)

  30. Tang J, Zha ZJ, Tao D, Chua TS (2012) Semantic-gap oriented active learning for multi-label image annotation. IEEE Trans Image Process 21(4):2354–2360

    Article  MathSciNet  Google Scholar 

  31. Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: ECCV

  32. Vailaya A, Figueiredo A, Jain A, Zhang H (2001) Image classification for content-based indexing. IEEE Trans Image Process 10:117–129

    Article  MATH  Google Scholar 

  33. Wang H, Ding C, Huang H (2010) Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAI

  34. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2011) Locality-constrained linear coding for image classification. In: CVPR

  35. Wang S, Jiang S, Huang Q, Tian Q (2012) Multi-feature metric learning with knowledge transfer among semantics and social tagging. In: CVPR

  36. Wei S, Xu D, Li X, Zhao Y (2013) Joint optimization toward effective and efficient image search. IEEE Trans Cybern

  37. Wei S, Zhao Y, Zhu Z, Liu N (2010) Multimodal fusion for video search reranking. IEEE Trans Knowl Data Eng 22(8):1191–1199

    Article  Google Scholar 

  38. Wei S, Zhao Y, Zhu C, Xu C, Zhu Z (2011) Frame fusion for video copy detection. IEEE Trans Circ Syst Video Technol 21(1):15–28

    Article  Google Scholar 

  39. Wu F, Han Y, Tian Q, Zhuang Y (2010) Multi-label boosting for image annotation by structural grouping sparsity. In: ACM Multimedia

  40. Xu H, Wang J, Hua X, Li S (2009) Tag refinement by regularized lda. In: ACM multimedia

  41. Xu Z, Yang Y, Tsang I, Sebe N, Hauptmann AG (2013) Feature weighting via optimal thresholding for video analysis. In: ICCV

  42. Yang Y, Wu F, Nie F, Shen H T, Zhuang Y, Hauptmann AG (2012) Web and personal image annotation by mining label correlation with relaxed visual graph embedding. IEEE Trans Image Process 21(3):1339–1351

Download references

Acknowledgments

This work was supported in part by National Basic Research Program of China (973 Program):2012CB316400, in part by National Natural Science Foundation of China: 61322212, 61025011, 61332016 in part by the Key Technologies R&D Program of China:2012BAH18B02 and in part by in part by National Hi-Tech Development Program (863 Program) of China: 2014AA015202.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xinghang Song or Shuqiang Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, X., Jiang, S., Wang, S. et al. Polysemious visual representation based on feature aggregation for large scale image applications. Multimed Tools Appl 74, 595–611 (2015). https://doi.org/10.1007/s11042-014-1975-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-1975-5

Keywords

Navigation