Article

Regularizing translation models for better automatic image annotation

Authors:

Joyce Y. ChaiAuthors Info & Claims

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

Pages 350 - 359

https://doi.org/10.1145/1031171.1031242

Published: 13 November 2004 Publication History

Abstract

The goal of automatic image annotation is to automatically generate annotations for images to describe their content. In the past, statistical machine translation models have been successfully applied to automatic image annotation task [8]. It views the process of annotating images as a process of translating the content from a 'visual language' to textual words. One problem with the existing translation models is that common words are usually associated with too many different image regions. As a result, uncommon words have little chance to be used for annotating images. Uncommon words are important for automatic image annotation because they are often used in the queries. In this paper, we propose two modified translation models for automatic image annotation, namely the normalized translation model and the regularized translation model, that specifically address the problem of common annotated words. The basic idea is to raise the number of blobs that are associated with uncommon words. The normalized translation model realizes this by scaling translation probabilities of different words with different factors. The same goal is achieved in the regularized translation model through the introduction of a special Dirichlet prior. Empirical study with the Corel dataset has shown that both two modified translation models outperform the original translation model and several existing approaches for automatic image annotation substantially.

References

[1]

Barnard, K., P. Duygulu, and D. Forsyth. Clustering Art. in Proceedings of the 2001 IEEE Computer Society Conference on Pattern Recognition. 2001.

[2]

Barnard, K., P. Duygulu, N. d. Freitas, D. Forsyth, D. Blei, and M. I. Jordan, Matching Words and Pictures. Journal of Machine Learning Research, 2003. 3: p. 1107--1135.

Digital Library

[3]

Blei, D. and M. Jordan. Modeling Annotated Data. in Proceedings of 26th International Conference on Research and Development in Information Retrieval (SIGIR). 2003.

Digital Library

[4]

Brown, P., S. D. Pietra, V. D. Pietra, and R. Mercer, The Mathematics of Statistical Machine Translation. Computational Linguistics, 1993. 19(2): p. 263--311.

Digital Library

[5]

Chang, E., K. Goh, G. Sychay, and G. Wu, CBSA: Content-based Soft Annotation for Multimodal Image Retrieval using Bayes Point Machines. CirSysVideo, 2003. 13(1): p. 26--38.

Digital Library

[6]

Cusano, C., G. Ciocca, and R. Schettini. Image Annotation using SVM. in Proceedings of Internet imaging IV, Vol. SPIE 5304. 2004.

[7]

Dempster, A. P., N. M. Laird, and D.B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society, 1977. 39(1): p. 1--38.

[8]

Duygulu, P., K. Barnard, N. d. Freitas, and D. A. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. in Proceedings of 7th European Conference on Computer Vision. 2002.

Digital Library

[9]

Jeon, J., V. Lavrenko, and R. Manmatha. Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. 2003.

Digital Library

[10]

Jin, R., C. X. Zhai, and A. G. Hauptmann. Title Language Model for Information Retrieval. in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002). 2002.

Digital Library

[11]

Lavrenko, V. and B. Croft. Relevance-based language models. in The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001.

Digital Library

[12]

Lavrenko, V., R. Manmatha, and J. Jeon. A Model for Learning the Semantics of Pictures. in Proceedings of Advance in Neutral Information Processing. 2003.

[13]

Li, J. and J. Z. Wang, Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2003. 25(19): p. 1075--1088.

Digital Library

[14]

Maron, O., Learning from Ambiguity. 1998, MIT.

[15]

Monay, F. and D. Gatica-Perez. On Image Auto-Annotation with Latent Space Models. in Proc. ACM International Conference on Multimedia. 2003.

Digital Library

[16]

Mori, Y., H. TAKAHASHI, and R. Oka. Image-to-Word Transformation Based on Dividing and Vector Quantizing Images With Words. in MISRM'99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management. 1999.

[17]

Ponte, J., A Language Modeling Approach to Information Retrieval, in Department of Computer Science. 1998, Univ. of Massachusetts at Amherst.

Digital Library

[18]

Schittowski, K., NLQPL: A FORTRAN-Subroutine Solving Constrained Nonlinear Programming Problems. Annals of Operations Research, 1985. 5: p. 485--500.

[19]

Zhai, C. X. and J. Lafferty. Model-based Feedback in the KL-divergence Retrieval Model. in Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM 2001). 2001.

Cited By

Soni VMathariya SSoni R(2014)User friendly approach for video search technique using text and image as query2014 Conference on IT in Business, Industry and Government (CSIBIG)10.1109/CSIBIG.2014.7056999(1-12)Online publication date: Mar-2014
https://doi.org/10.1109/CSIBIG.2014.7056999
Irfanullah Aslam NLoo JRoohullah Loomes M(2011)Adding semantics to the reliable object annotated image databasesProcedia Computer Science10.1016/j.procs.2010.12.0693(414-419)Online publication date: 2011
https://doi.org/10.1016/j.procs.2010.12.069
Jin YKhan LPrabhakaran B(2010)Knowledge Based Image Annotation RefinementJournal of Signal Processing Systems10.1007/s11265-009-0391-y58:3(387-406)Online publication date: 1-Mar-2010
https://dl.acm.org/doi/10.1007/s11265-009-0391-y
Show More Cited By

Index Terms

Regularizing translation models for better automatic image annotation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Translating topics to words for image annotation
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

One of the classic techniques for image annotation is the language translation model. It views an image as a document, i.e., a set of visual words which are obtained by vector quatitizing the image regions generated by unsupervised image segmentation. ...
Using Sublexical Translations to Handle the OOV Problem in Machine Translation

We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based ...
Exploiting ontologies for automatic image annotation
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Automatic image annotation is the task of automatically assigning words to an image that describe the content of the image. Machine learning approaches have been explored to model the association between words and images from an annotated set of images ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

November 2004

678 pages

ISBN:1581138741

DOI:10.1145/1031171

General Chair:
David Grossman
Illinois Institute of Technology
,
Program Chairs:
Luis Gravano
Columbia University
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign
,
Otthein Herzog
University of Bremen, Germany
,
David A. Evans
Clairvoyance Corporation

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM04

Sponsor:

CIKM04: Conference on Information and Knowledge Management

November 8 - 13, 2004

D.C., Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
505
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Soni VMathariya SSoni R(2014)User friendly approach for video search technique using text and image as query2014 Conference on IT in Business, Industry and Government (CSIBIG)10.1109/CSIBIG.2014.7056999(1-12)Online publication date: Mar-2014
https://doi.org/10.1109/CSIBIG.2014.7056999
Irfanullah Aslam NLoo JRoohullah Loomes M(2011)Adding semantics to the reliable object annotated image databasesProcedia Computer Science10.1016/j.procs.2010.12.0693(414-419)Online publication date: 2011
https://doi.org/10.1016/j.procs.2010.12.069
Jin YKhan LPrabhakaran B(2010)Knowledge Based Image Annotation RefinementJournal of Signal Processing Systems10.1007/s11265-009-0391-y58:3(387-406)Online publication date: 1-Mar-2010
https://dl.acm.org/doi/10.1007/s11265-009-0391-y
Kwaśnicka HParadowski M(2010)Machine Learning Methods in Automatic Image AnnotationAdvances in Machine Learning II10.1007/978-3-642-05179-1_18(387-411)Online publication date: 2010
https://doi.org/10.1007/978-3-642-05179-1_18
Paradowski MKwasnicka H(2009)Improved Resulted Word Counts Optimizer for Automatic Image Annotation ProblemFundamenta Informaticae10.5555/2365165.236516996:4(435-463)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/2365165.2365169
Paradowski MKwasnicka H(2009)Improved Resulted Word Counts Optimizer for Automatic Image Annotation ProblemFundamenta Informaticae10.5555/1735972.173597696:4(435-463)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.5555/1735972.1735976
Li WSun M(2008)Multi-modal Multi-label Semantic Indexing of Images Using Unlabeled DataProceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology10.1109/ALPIT.2008.107(204-209)Online publication date: 23-Jul-2008
https://dl.acm.org/doi/10.1109/ALPIT.2008.107
Kwasnicka HParadowski M(2008)Resulted word counts optimization-A new approach for better automatic image annotationPattern Recognition10.1016/j.patcog.2008.06.01741:12(3562-3571)Online publication date: 1-Dec-2008
https://dl.acm.org/doi/10.1016/j.patcog.2008.06.017
Abdul Hamid OAbdul Qadir MIftikhar NUr Rehman MUddin Ahmed MIhsan I(2007)Generic Multimedia Database Architecture Based upon Semantic LibrariesInformatica10.5555/1413995.141399618:4(483-510)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.5555/1413995.1413996
Li WSun MHabel C(2007)Multi-modal Multi-label Semantic Indexing of Images Based on Hybrid Ensemble LearningAdvances in Multimedia Information Processing – PCM 200710.1007/978-3-540-77255-2_90(744-754)Online publication date: 2007
https://doi.org/10.1007/978-3-540-77255-2_90
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents