skip to main content
10.1145/1031171.1031242acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Regularizing translation models for better automatic image annotation

Published: 13 November 2004 Publication History

Abstract

The goal of automatic image annotation is to automatically generate annotations for images to describe their content. In the past, statistical machine translation models have been successfully applied to automatic image annotation task [8]. It views the process of annotating images as a process of translating the content from a 'visual language' to textual words. One problem with the existing translation models is that common words are usually associated with too many different image regions. As a result, uncommon words have little chance to be used for annotating images. Uncommon words are important for automatic image annotation because they are often used in the queries. In this paper, we propose two modified translation models for automatic image annotation, namely the normalized translation model and the regularized translation model, that specifically address the problem of common annotated words. The basic idea is to raise the number of blobs that are associated with uncommon words. The normalized translation model realizes this by scaling translation probabilities of different words with different factors. The same goal is achieved in the regularized translation model through the introduction of a special Dirichlet prior. Empirical study with the Corel dataset has shown that both two modified translation models outperform the original translation model and several existing approaches for automatic image annotation substantially.

References

[1]
Barnard, K., P. Duygulu, and D. Forsyth. Clustering Art. in Proceedings of the 2001 IEEE Computer Society Conference on Pattern Recognition. 2001.
[2]
Barnard, K., P. Duygulu, N. d. Freitas, D. Forsyth, D. Blei, and M. I. Jordan, Matching Words and Pictures. Journal of Machine Learning Research, 2003. 3: p. 1107--1135.
[3]
Blei, D. and M. Jordan. Modeling Annotated Data. in Proceedings of 26th International Conference on Research and Development in Information Retrieval (SIGIR). 2003.
[4]
Brown, P., S. D. Pietra, V. D. Pietra, and R. Mercer, The Mathematics of Statistical Machine Translation. Computational Linguistics, 1993. 19(2): p. 263--311.
[5]
Chang, E., K. Goh, G. Sychay, and G. Wu, CBSA: Content-based Soft Annotation for Multimodal Image Retrieval using Bayes Point Machines. CirSysVideo, 2003. 13(1): p. 26--38.
[6]
Cusano, C., G. Ciocca, and R. Schettini. Image Annotation using SVM. in Proceedings of Internet imaging IV, Vol. SPIE 5304. 2004.
[7]
Dempster, A. P., N. M. Laird, and D.B. Rubin, Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society, 1977. 39(1): p. 1--38.
[8]
Duygulu, P., K. Barnard, N. d. Freitas, and D. A. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. in Proceedings of 7th European Conference on Computer Vision. 2002.
[9]
Jeon, J., V. Lavrenko, and R. Manmatha. Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. in Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. 2003.
[10]
Jin, R., C. X. Zhai, and A. G. Hauptmann. Title Language Model for Information Retrieval. in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002). 2002.
[11]
Lavrenko, V. and B. Croft. Relevance-based language models. in The 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001.
[12]
Lavrenko, V., R. Manmatha, and J. Jeon. A Model for Learning the Semantics of Pictures. in Proceedings of Advance in Neutral Information Processing. 2003.
[13]
Li, J. and J. Z. Wang, Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach,. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2003. 25(19): p. 1075--1088.
[14]
Maron, O., Learning from Ambiguity. 1998, MIT.
[15]
Monay, F. and D. Gatica-Perez. On Image Auto-Annotation with Latent Space Models. in Proc. ACM International Conference on Multimedia. 2003.
[16]
Mori, Y., H. TAKAHASHI, and R. Oka. Image-to-Word Transformation Based on Dividing and Vector Quantizing Images With Words. in MISRM'99 First International Workshop on Multimedia Intelligent Storage and Retrieval Management. 1999.
[17]
Ponte, J., A Language Modeling Approach to Information Retrieval, in Department of Computer Science. 1998, Univ. of Massachusetts at Amherst.
[18]
Schittowski, K., NLQPL: A FORTRAN-Subroutine Solving Constrained Nonlinear Programming Problems. Annals of Operations Research, 1985. 5: p. 485--500.
[19]
Zhai, C. X. and J. Lafferty. Model-based Feedback in the KL-divergence Retrieval Model. in Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM 2001). 2001.

Cited By

View all
  • (2014)User friendly approach for video search technique using text and image as query2014 Conference on IT in Business, Industry and Government (CSIBIG)10.1109/CSIBIG.2014.7056999(1-12)Online publication date: Mar-2014
  • (2011)Adding semantics to the reliable object annotated image databasesProcedia Computer Science10.1016/j.procs.2010.12.0693(414-419)Online publication date: 2011
  • (2010)Knowledge Based Image Annotation RefinementJournal of Signal Processing Systems10.1007/s11265-009-0391-y58:3(387-406)Online publication date: 1-Mar-2010
  • Show More Cited By

Index Terms

  1. Regularizing translation models for better automatic image annotation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
    November 2004
    678 pages
    ISBN:1581138741
    DOI:10.1145/1031171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 November 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic image annotation
    2. normalized translation model
    3. regularized translation model
    4. translation model

    Qualifiers

    • Article

    Conference

    CIKM04
    Sponsor:
    CIKM04: Conference on Information and Knowledge Management
    November 8 - 13, 2004
    D.C., Washington, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)User friendly approach for video search technique using text and image as query2014 Conference on IT in Business, Industry and Government (CSIBIG)10.1109/CSIBIG.2014.7056999(1-12)Online publication date: Mar-2014
    • (2011)Adding semantics to the reliable object annotated image databasesProcedia Computer Science10.1016/j.procs.2010.12.0693(414-419)Online publication date: 2011
    • (2010)Knowledge Based Image Annotation RefinementJournal of Signal Processing Systems10.1007/s11265-009-0391-y58:3(387-406)Online publication date: 1-Mar-2010
    • (2010)Machine Learning Methods in Automatic Image AnnotationAdvances in Machine Learning II10.1007/978-3-642-05179-1_18(387-411)Online publication date: 2010
    • (2009)Improved Resulted Word Counts Optimizer for Automatic Image Annotation ProblemFundamenta Informaticae10.5555/2365165.236516996:4(435-463)Online publication date: 1-Dec-2009
    • (2009)Improved Resulted Word Counts Optimizer for Automatic Image Annotation ProblemFundamenta Informaticae10.5555/1735972.173597696:4(435-463)Online publication date: 1-Dec-2009
    • (2008)Multi-modal Multi-label Semantic Indexing of Images Using Unlabeled DataProceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology10.1109/ALPIT.2008.107(204-209)Online publication date: 23-Jul-2008
    • (2008)Resulted word counts optimization-A new approach for better automatic image annotationPattern Recognition10.1016/j.patcog.2008.06.01741:12(3562-3571)Online publication date: 1-Dec-2008
    • (2007)Generic Multimedia Database Architecture Based upon Semantic LibrariesInformatica10.5555/1413995.141399618:4(483-510)Online publication date: 1-Dec-2007
    • (2007)Multi-modal Multi-label Semantic Indexing of Images Based on Hybrid Ensemble LearningAdvances in Multimedia Information Processing – PCM 200710.1007/978-3-540-77255-2_90(744-754)Online publication date: 2007
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media