skip to main content
10.1145/1027527.1027660acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Multi-level annotation of natural scenes using dominant image components and semantic concepts

Published: 10 October 2004 Publication History

Abstract

Automatic image annotation is a promising solution to enable semantic image retrieval via keywords. In this paper, we propose a multi-level approach to annotate the semantics of <b><i>natural scenes</i></b> by using both the dominant image components (salient objects) and the relevant semantic concepts. To achieve automatic image annotation at the content level, we use salient objects as the dominant image components for image content representation and feature extraction. To support automatic image annotation at the concept level, a novel image classification technique is developed to map the images into the most relevant semantic image concepts. In addition, Support Vector Machine (SVM) classifiers are used to learn the detection functions for the pre-defined salient objects and finite mixture models are used for semantic concept interpretation and modeling. An <b><i>adaptive EM algorithm</i></b> has been proposed to determine the optimal model structure and model parameters simultaneously. We have also demonstrated that our algorithms are very effective to enable multi-level annotation of <b><i>natural scenes</i></b> in a large-scale image dataset.

References

[1]
J.R. Smith and S.F. Chang, "Visually searching the web for content", IEEE Multimedia, 1997.
[2]
E. Chang, "Statistical learning for effective visual information retrieval", Proc. ICIP, 2003.
[3]
X. He, W.-Y. Ma, O. King, M. Li and H.J. Zhang, "Learning and inferring a semantic space from user's relevance feedback", ACM MM, 2002.
[4]
J.R. Smith and C.S. Li, "Image classification and querying using composite region templates", Computer Vision and Image Understanding, vol.75, 1999.
[5]
P. Duygulu, K. Barnard, N. de Freitas, D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary", ECCV, 2002.
[6]
K. Branard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, M.I. Jordan, "Matching words and pictures", Journal of Machine Learning Research, vol.3, pp.1107--1135, 2003.
[7]
M. Szummer and R.W. Picard, "Indoor-outdoor image classification", Proc. ICAIVL, 1998.
[8]
R. Schettini, A. Valsasna, C. Brambilla, M. De Ponti, "A indoor/outdoor/close-up photo classifier", Proc. Color Imaging, 2001.
[9]
C. Carson, S. Belongie, H. Greenspan, J. Malik, "Region-based image querying", ICAIVL, 1997.
[10]
J. Huang, S.R. Kumar and R. Zabih, "An automatic hierarchical image classification scheme", ACM MM, 1998.
[11]
N. Campbell, B. Thomas, T. Troscianko, "Automatic segmentation and classification of outdoor images using neural networks", Intl. Journal of Neural Systems, vol.8, pp.137--144, 1997.
[12]
J. Li, J.Z. Wang, and G. Wiederhold, "SIMPLIcity: Semantic-sensitive integrated matching for picture libraries", VISUAL, Lyon, France, 2000.
[13]
A. Vailaya, M. Figueiredo, A.K. Jain, H.J. Zhang, "Image classification for content-based indexing", IEEE Trans. on Image Processing, vol.10, 2001.
[14]
A. Hartmann, R. Lienhart, "Automatic classification of images on the web", Proc. SPIE, vol.4676, 2002.
[15]
E. Chang, K. Goh, G. Sychay, G. Wu, "CBSA: Content-based annotation for multimodal image retrieval using Bayes point machines", IEEE Trans. CSVT, 2002.
[16]
B. Li, K. Goh, E. Chang, "Confidence-based dynamic ensamble for image annotation and semantic discovery", ACM MM, 2003.
[17]
A. Mojsilovic, J. Gomes, B. Rogowitz, "ISee: Perceptual features for image library navigation", Proc. SPIE, 2001.
[18]
A.B. Torralba and A. Oliva, "Semantic organization of scenes using discriminant structural templates", Proc. of IEEE ICCV, 1999.
[19]
J.R. Smith and S.-F. Chang, "Multi-stage classification of images from features and related text", Proc. DELOS, 1997.
[20]
F. Money, D. Gatica-Perez, "On image auto- annotation with latent space model", ACM MM, 2003.
[21]
J. Luo and S. Etz, "A physical model-based approach to detecting sky in photographic images", IEEE Trans. on Image Processing, vol.11, 2002.
[22]
S.F. Chang, W. Chen, H. Sundaram, "Semantic visual template: Linking visual features to semantics", Proc. ICIP, 1998.
[23]
S. Tong and E. Chang, "Support vector machine active learning for image retrieval", ACM MM, 2001.
[24]
C. Zhang, T. Chen, "Indexing and retrieval of 3D models aided by active learning", ACM MM, 2001.
[25]
D. Comanicu, P. Meer, "Mean shift: A robust approach toward feature space analysis", IEEE Trans. PAMI, vol.24, pp.603-619, 2002.
[26]
Y. Wu, Q. Tian, T.S. Huang, "Discriminant-EM algorithm with application to image retrieval", Proc. CVPR, pp.222--227, 2000.
[27]
J. Lin, "Divergence measures based on the Shannon entropy", IEEE Trans. on IT, vol.37, no.1, 1991.
[28]
A.B. Benitez, J.R. Smith and S.-F. Chang, "MediaNet: A multimedia information network for knowledge representation", Proc. SPIE, vol.4210, 2000.
[29]
H. Greenspan, J. Goldberger, A. Mayer, "Probabilistic space-time video modeling via piecewise GMM", IEEE Trans. PAMI, vol.26, no.3, 2004.
[30]
K. Barnard and D. Forsyth, "Learning the semantics of words and pictures", Proc. ICCV, pp.408--415, 2001.
[31]
M.R. Naphade, X. Zhou, and T.S. Huang, "Image classification using a set of labeled and unlabeled images", Proc. SPIE, 2000.
[32]
M.R. Naphade and T.S. Huang, "A probabilistic framework for semantic video indexing, filtering, and retrival", IEEE Trans. on Multimedia, vol.3, pp.141--151, 2001.
[33]
R. Oami, A. Benitez, S.-F. Chang, N. Dimitrova, "Understanding and modeling user interests in consumer videos", ICME, 2004.
[34]
N. Ueda and R. Nakano, Z. Ghahramani, G. E. Hinton, "SMEM algorithm for mixture models", NIPS, 1998.
[35]
B. Zhang, C. Zhang, X. Yi, "Competitive EM algorithm for finite mixture models", Pattern Recognition, vol.37, pp.131--144, 2004.
[36]
M.A.T. Figueiredo and A.K. Jain, "Unsupervised learning of finite mixture models", IEEE Trans. on PAMI, vol.24, no.3, pp.318--396, 2002.

Cited By

View all
  • (2017)Early versus Late Dimensionality Reduction of Bag-of-Words Feature Representation for Image ClassificationProceedings of the 4th International Conference on Bioinformatics Research and Applications10.1145/3175587.3175598(42-45)Online publication date: 8-Dec-2017
  • (2017)Safe binary particle swam algorithm for an enhanced unsupervised label refinement in automatic face annotationMultimedia Tools and Applications10.1007/s11042-016-4058-y76:18(18339-18359)Online publication date: 1-Sep-2017
  • (2016)Social Diffusion Analysis With Common-Interest Model for Image AnnotationIEEE Transactions on Multimedia10.1109/TMM.2015.247727718:4(687-701)Online publication date: Apr-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia
October 2004
1028 pages
ISBN:1581138938
DOI:10.1145/1027527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive EM algorithm
  2. automatic image annotation
  3. salient objects

Qualifiers

  • Article

Conference

MM04

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Early versus Late Dimensionality Reduction of Bag-of-Words Feature Representation for Image ClassificationProceedings of the 4th International Conference on Bioinformatics Research and Applications10.1145/3175587.3175598(42-45)Online publication date: 8-Dec-2017
  • (2017)Safe binary particle swam algorithm for an enhanced unsupervised label refinement in automatic face annotationMultimedia Tools and Applications10.1007/s11042-016-4058-y76:18(18339-18359)Online publication date: 1-Sep-2017
  • (2016)Social Diffusion Analysis With Common-Interest Model for Image AnnotationIEEE Transactions on Multimedia10.1109/TMM.2015.247727718:4(687-701)Online publication date: Apr-2016
  • (2016)Statistical modeling for automatic image indexing and retrievalNeurocomputing10.1016/j.neucom.2016.04.033207:C(105-119)Online publication date: 26-Sep-2016
  • (2015)Manifold Kernel Metric Learning for Larger-Scale Image AnnotationIEICE Transactions on Information and Systems10.1587/transinf.2014EDL8216E98.D:7(1396-1400)Online publication date: 2015
  • (2015)Uniform Detection in Social Image Streams2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE.2015.63(180-185)Online publication date: Oct-2015
  • (2014)Mining Weakly Labeled Web Facial Images for Search-Based Face AnnotationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2012.24026:1(166-179)Online publication date: 1-Jan-2014
  • (2014)Modeling label dependencies in kernel learning for image annotation2014 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2014.7026189(5886-5890)Online publication date: Oct-2014
  • (2013)Learning to name facesProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484040(443-452)Online publication date: 28-Jul-2013
  • (2013)Image Semantic Information Mining Algorithm by Non-negative Matrix FactorizationProceedings of the 2013 Fourth International Conference on Intelligent Systems Design and Engineering Applications10.1109/ISDEA.2013.482(345-348)Online publication date: 6-Nov-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media