skip to main content
10.1145/1871437.1871552acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A probabilistic topic-connection model for automatic image annotation

Published: 26 October 2010 Publication History

Abstract

The explosive increase of image data on Internet has made it an important, yet very challenging task to index and automatically annotate image data. To achieve that end, sophisticated algorithms and models have been proposed to study the correlation between image content and corresponding text description. Despite the success of previous works, however, researchers are still facing two major difficulties that may undermine their effort of providing reliable and accurate annotations for images. The first difficulty is lacking of comprehensive benchmark image dataset with high quality text descriptions. The second difficulty is lacking of effective way to represent the image content and make it associate with the text descriptions. In our paper, we aim to deal with both problems. To deal with the first problem, we utilize Wikipedia as external knowledge source and enrich the ontology structure of ImageNet database with comprehensive and highly-reliable text descriptions from Wikipedia articles. To address the second problem, we develop a Probabilistic Topic-Connection (PTC) model to represent the connection between latent semantic topic in text description and latent patterns from image feature space. We compare the performance of our model with the currently popular Correspondence LDA (Corr-LDA) model under the same automatic image annotation scenario using cross-validation. Experimental results demonstrate that our model is able to well represent the connection between latent semantic topics and latent patterns in image feature space, thus facilitates knowledge organization and understanding of both image and text descriptions.

References

[1]
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-based image rerieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349--1380, 2000.
[2]
Lew, M. S., et al. Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl., 2006.
[3]
Jia Li and James Z. Wang, ''Real-time Computerized Annotation of Pictures,'' Proceedings of the ACM Multimedia Conference, pp. 911--920, ACM, Santa Barbara, CA, October 2006.
[4]
R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, "Learning Object Categories from Google's Image Search," Proc. Int'l Conf. Computer Vision, vol. II, pp. 1816--1823, Oct. 2005.
[5]
Changhu Wang, Lei Zhang, Hong-Jiang Zhang. Learning to Reduce the Semantic Gap in Web Image Retrieval and Annotation, in Proc. of the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR), Singapore, July
[6]
David M. Blei, Michael I. Jordan: Modeling annotated data. SIGIR 2003: 127--134
[7]
Gustavo Carneiro, Antoni B. Chan, Pedro J. Moreno, Nuno Vasconcelos, "Supervised Learning of Semantic Classes for Image Annotation and Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 3, pp. 394--410, Mar. 2007
[8]
Amr Ahmed, Eric P. Xing, William W. Cohen, Robert F. Murphy, Structured Correspondence topic models for mining captioned figures in biomedical literature, Proceedings of the 15th ACM SIGKDD International conference on Knowledge discovery and data mining, June 28-July 01, 2009, Paris, France.
[9]
X. Chen, C. Lu, Y. An, and P. Achananuparp. Probabilistic Models for Topic Learning from Images and Captions in Online Biomedical Literatures. In the Proceedings of 18th ACM Conference on Information and Knowledge Management (CIKM'09).
[10]
L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. PAMI, 28(4):594--611, April 2006.
[11]
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report 7694, Caltech, 2007.
[12]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/.
[13]
B. Russell, A. Torralba, K. Murphy, and W. Freeman. Labelme: A database and web-based tool for image annotation. IJCV, 77(1- 3):157--173, May 2008.
[14]
J. Deng, W. Dong, R. Socher, L. -J. Li and L. Fei-Fei, ImageNet: A Larget-Scale Hierarchical Image Database. IEEE Compter Visual and Pattern Recognition (CVPR), 2009.
[15]
Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
[16]
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, Xiaohua Zhou: Exploiting Wikipedia as external knowledge for document clustering. KDD 2009: 389--396
[17]
Hu, J., Fang, L., Cao, Y., et al. Enhancing Text Clustering by Leveraging Wikipedia Semantics. In Proceedings of the 31st annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Singapore, July 20 - 24, 2008). ACM Press, New York, NY, 179--186.
[18]
Wang, P. and Domeniconi, C. 2008. Building Semantic Kernels for text classification using Wikipedia. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (Nevada, Las Vegas, August 24 - 27, 2008). ACM Press, New York, NY, 713--721.
[19]
F. Smadja, Retrieving collections from text: Xtract. Computational Linguistics, 1993, 19(1), pp. 143--177
[20]
J. Yang, Y. G. Jiang, A. G. Hauptmann, C. W. Ngo, Evaluating Bag-of-Visual-Words Representations in Scene Classification. ACM SIGMM Int'l Workshop on Multimedia Information Retrieval (MIR'07), Augsburg, Germany, Sep. 2007.
[21]
J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision, vol. 73, no. 2, June 2007, pp. 213--238
[22]
Yu-Gang Jiang, Chong-Wah Ngo, Jun Yang: Towards optimal bag-of-features for object categorization and semantic video retrieval. CIVR 2007: 494--501
[23]
Lowe, D. Distinctive Image Features from Scale-Invariant Key Points. International Journal of Computer Vision, 60(2): 91--110, 2004.
[24]
Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. International Conference on Computer Vision. (2003) 1470--1477
[25]
J. Matas, O. Chum, U. M., T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In BMVC, 2002.
[26]
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE T. PAMI, 27(10):1615--1630, 2005.
[27]
L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. Computer Vision and Pattern Recognition (CVPR) 2009.
[28]
Zhong Wu, Qifa Ke, M. Isard, Jian Sun, Bundling features for large scale partial-duplicate web image search Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (18 August 2009), pp. 25--32.
[29]
Per-Erik Forssén and David G. Lowe, "Shape descriptors for maximally stable extremal regions," International Conference on Computer
[30]
Van Rijsbergen, C.J., Information Retrieval, Butterworths, 1975.
[31]
T. L. Griffiths, M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101:5228--5235, 2004.
[32]
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis 2nd edition. Chapman-Hall, 2003.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic image annotation
  2. gibbs sampling
  3. image feature extraction
  4. probabilistic models
  5. topic learning

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Topic model-based recommender systems and their applications to cold-start problemsExpert Systems with Applications10.1016/j.eswa.2022.117129202(117129)Online publication date: Sep-2022
  • (2016)Part-based clothing image annotation by visual neighbor retrievalNeurocomputing10.1016/j.neucom.2015.12.141213:C(115-124)Online publication date: 12-Nov-2016
  • (2015)Learning Forum Posts Topic Discovery and Its Application in Recommendation SystemJournal of Software10.17706/jsw.10.4.392-40210:4(392-402)Online publication date: Apr-2015
  • (2015)Image auto-annotation via tag-dependent random search over range-constrained visual neighboursMultimedia Tools and Applications10.1007/s11042-013-1811-374:11(4091-4116)Online publication date: 1-Jun-2015
  • (2014)Bilateral Correspondence Model for Words-and-Pictures Association in Multimedia-Rich MicroblogsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/261138810:4(1-21)Online publication date: 4-Jul-2014
  • (2012)Author-conference topic-connection model for academic network searchProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398597(2179-2183)Online publication date: 29-Oct-2012
  • (2012)Modeling semantic relations between visual attributes and object categories via dirichlet forest priorProceedings of the 21st ACM international conference on Information and knowledge management10.1145/2396761.2398428(1263-1272)Online publication date: 29-Oct-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media