Skip to main content

SemanGist: A Local Semantic Image Representation

  • Conference paper
Advances in Multimedia Information Processing - PCM 2008 (PCM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5353))

Included in the following conference series:

Abstract

Although various kinds of image features have been proposed, there exists no single optimal feature which can save the effort of all other features for multimedia analysis applications, e.g. image annotation. In this paper, we propose a novel image representation, Semantic Gist (SemanGist), to combine the merit of multiple features automatically. Given a local image patch, SemanGist converts multiple low-level features of the patch into compact prediction scores of a few predefined semantic categories. To this end, a discriminative multi-label boosting algorithm is adopted. This local SemanGist output allows for incorporating semantic spatial context among adjacent patches. For applications like image annotation, this may further reduce possible annotation errors by considering the label compatibility. The same boosting algorithm is applied to the SemanGist representation, together with low-level features, to ensure the label compatibility. Experiments on an image annotation task show that SemanGist not only achieves compact representation but also incorporates spatial context at low run-time computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 145–175 (2001)

    Article  MATH  Google Scholar 

  2. Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 209–216. ACM, New York (2007)

    Google Scholar 

  3. van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Snoek, C.G.M., Smeulders, A.W.M.: Robust scene categorization by learning image statistics in context. In: SLAM workshop on CVPR 2006, p. 105 (2006)

    Google Scholar 

  4. Amir, A., et al.: IBM research trecvid-2003 video retrieval system. In: Proc. of TRECVID workshop (2004)

    Google Scholar 

  5. Snoek, C.G.M., Worring, M., Geusebroek, J.M., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M.: The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1678–1689 (2006)

    Article  Google Scholar 

  6. Jiang, W., Chang, S.F., Loui, A.C.: Context-based concept fusion with boosted conditional random fields. In: Proc. of ICASSP, Hawaii, USA (April 2007)

    Google Scholar 

  7. Yan, R., Tesic, J., Smith, J.R.: Model-shared subspace boosting for multi-label classification. In: Proc. of ACM KDD 2007 (2007)

    Google Scholar 

  8. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, vol. 2, pp. 762–769 (2004)

    Google Scholar 

  9. Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: MIR 2007: Proc. of MIR workshop, pp. 61–70 (2007)

    Google Scholar 

  10. Fredman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 274–337 (2000)

    MathSciNet  Google Scholar 

  11. Pudil, P., Ferri, F., Novovicova, J., Kittler, J.: Floating search methods for feature selection with nonmonotonic criterion functions. Pattern Recognition 2, 279–283 (1994)

    Google Scholar 

  12. Yuan, J., Li, J., Zhang, B.: Exploiting spatial context constraints for autmatic image region annotation. In: Proc. of ACM Multimedia 2007 (2007)

    Google Scholar 

  13. Altun, Y., Hofmann, T., Johnson, M.: Discriminative learning for label sequences via boosting (2003)

    Google Scholar 

  14. Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: Proc. of Intl. MIR workshop (2006)

    Google Scholar 

  15. Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 23, 800–810 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, D., Liu, X., Wang, D., Li, J., Zhang, B. (2008). SemanGist: A Local Semantic Image Representation. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89796-5_64

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89795-8

  • Online ISBN: 978-3-540-89796-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics