SemanGist: A Local Semantic Image Representation

Wang, Dong; Liu, Xiaobing; Wang, Duanpeng; Li, Jianmin; Zhang, Bo

doi:10.1007/978-3-540-89796-5_64

Dong Wang⁸,
Xiaobing Liu⁸,
Duanpeng Wang⁸,
Jianmin Li⁸ &
…
Bo Zhang⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5353))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1394 Accesses
1 Citations

Abstract

Although various kinds of image features have been proposed, there exists no single optimal feature which can save the effort of all other features for multimedia analysis applications, e.g. image annotation. In this paper, we propose a novel image representation, Semantic Gist (SemanGist), to combine the merit of multiple features automatically. Given a local image patch, SemanGist converts multiple low-level features of the patch into compact prediction scores of a few predefined semantic categories. To this end, a discriminative multi-label boosting algorithm is adopted. This local SemanGist output allows for incorporating semantic spatial context among adjacent patches. For applications like image annotation, this may further reduce possible annotation errors by considering the label compatibility. The same boosting algorithm is applied to the SemanGist representation, together with low-level features, to ensure the label compatibility. Experiments on an image annotation task show that SemanGist not only achieves compact representation but also incorporates spatial context at low run-time computational cost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Oliva, A., Torralba, A.: Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vision 42, 145–175 (2001)
Article MATH Google Scholar
Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 209–216. ACM, New York (2007)
Google Scholar
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Snoek, C.G.M., Smeulders, A.W.M.: Robust scene categorization by learning image statistics in context. In: SLAM workshop on CVPR 2006, p. 105 (2006)
Google Scholar
Amir, A., et al.: IBM research trecvid-2003 video retrieval system. In: Proc. of TRECVID workshop (2004)
Google Scholar
Snoek, C.G.M., Worring, M., Geusebroek, J.M., Koelma, D.C., Seinstra, F.J., Smeulders, A.W.M.: The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1678–1689 (2006)
Article Google Scholar
Jiang, W., Chang, S.F., Loui, A.C.: Context-based concept fusion with boosted conditional random fields. In: Proc. of ICASSP, Hawaii, USA (April 2007)
Google Scholar
Yan, R., Tesic, J., Smith, J.R.: Model-shared subspace boosting for multi-label classification. In: Proc. of ACM KDD 2007 (2007)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, vol. 2, pp. 762–769 (2004)
Google Scholar
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: MIR 2007: Proc. of MIR workshop, pp. 61–70 (2007)
Google Scholar
Fredman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics 28, 274–337 (2000)
MathSciNet Google Scholar
Pudil, P., Ferri, F., Novovicova, J., Kittler, J.: Floating search methods for feature selection with nonmonotonic criterion functions. Pattern Recognition 2, 279–283 (1994)
Google Scholar
Yuan, J., Li, J., Zhang, B.: Exploiting spatial context constraints for autmatic image region annotation. In: Proc. of ACM Multimedia 2007 (2007)
Google Scholar
Altun, Y., Hofmann, T., Johnson, M.: Discriminative learning for label sequences via boosting (2003)
Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: Proc. of Intl. MIR workshop (2006)
Google Scholar
Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. Pattern Anal. Mach. Intell. 23, 800–810 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and System, Tsinghua National Laboratory for Information Science and Technology (TNList) Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China
Dong Wang, Xiaobing Liu, Duanpeng Wang, Jianmin Li & Bo Zhang

Authors

Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaobing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Duanpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Engineering Science, National Cheng Kung University, No.1, University Road, 701, Tainan City, Taiwan
Yueh-Min Ray Huang
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, China
Changsheng Xu
Institute of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Kuo-Sheng Cheng
Department of Electrical Engineering, National Cheng Kung University, No. 1, University Road, 701, Tainan City, Taiwan
Jar-Ferr Kevin Yang
Department of Electrical and Computer Engineering, Concordia University, S-EV005.139, 1515 St. Catherine West, Montreal, H4G 2W1, Quebec, Canada
M. N. S. Swamy
Microsoft Research Asia, 5/F, Beijing Sigma Center, No. 49, Zhichun Road, Hai Dian District, 100080, Beijing, China
Shipeng Li
Department of Information Management, National Kaohsiung University of Applied Sciences, No. 415, Jiangong Road, Sanmin District, 80778, Kaohsiung, Taiwan
Jen-Wen Ding

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Liu, X., Wang, D., Li, J., Zhang, B. (2008). SemanGist: A Local Semantic Image Representation. In: Huang, YM.R., et al. Advances in Multimedia Information Processing - PCM 2008. PCM 2008. Lecture Notes in Computer Science, vol 5353. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89796-5_64

Download citation

DOI: https://doi.org/10.1007/978-3-540-89796-5_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89795-8
Online ISBN: 978-3-540-89796-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics