skip to main content
10.1145/1076034.1076127acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Hidden Markov models for automatic annotation and content-based retrieval of images and video

Published:15 August 2005Publication History

ABSTRACT

This paper introduces a novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval. An image, represented as sequence of feature-vectors characterizing low-level visual features such as color, texture or oriented-edges, is modeled as having been stochastically generated by a hidden Markov model, whose states represent concepts. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with the a posteriori probability of concepts present in it. This annotation supports content-based search of the image-collection via keywords. Various aspects of model parameterization, parameter estimation, and image annotation are discussed. Empirical retrieval results are presented on two image-collections | COREL and key-frames from TRECVID. Comparisons are made with two other recently developed techniques on the same datasets.

References

  1. A. Amir et al. IBM Research TRECVID-2003 Video Retrieval System. In Proc. TRECVID2003, November 2003.Google ScholarGoogle Scholar
  2. K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei and M. I. Jordan. Modeling Annotated Data. In 26th Annual International ACM SIGIR Conference, pages 127--134, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In Seventh European Conference on Computer Vision, volume4, pages 97--112, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. L. Feng, R. Manmatha, and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages II--1002--II--1009, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Iyengar et al. Joint Visual-Test Modeling for Multimedia Retrieval. Available at: http://www.clsp.jhu.edu/ws2004/groups/ws04vstxt/, 2004.Google ScholarGoogle Scholar
  7. J. Jeon, V. Lavrenko, and R. Manmatha. Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. In 26th Annual International ACM SIGIR COnference, pages 119--126, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Lavrenko, S. L. Feng, and R. Manmatha. Statistical models for automatic video annotation and retrieval. In Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, volume 3, pages 17--21, May 2003.Google ScholarGoogle Scholar
  9. J. Li and J. Z. Wang. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(9):1075--1088, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. NIST. In Proceedings of the TREC Video Retrieval Evaluation Conference (TRECVID2003), November 2003.Google ScholarGoogle Scholar
  11. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257--286, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  12. S. Young et al. The HTK Book. 2002.Google ScholarGoogle Scholar
  13. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages I--511--I--518, December 2001.Google ScholarGoogle Scholar

Index Terms

  1. Hidden Markov models for automatic annotation and content-based retrieval of images and video

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
        August 2005
        708 pages
        ISBN:1595930345
        DOI:10.1145/1076034

        Copyright © 2005 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 August 2005

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader