skip to main content
10.1145/1076034.1076127acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Hidden Markov models for automatic annotation and content-based retrieval of images and video

Published: 15 August 2005 Publication History

Abstract

This paper introduces a novel method for automatic annotation of images with keywords from a generic vocabulary of concepts or objects for the purpose of content-based image retrieval. An image, represented as sequence of feature-vectors characterizing low-level visual features such as color, texture or oriented-edges, is modeled as having been stochastically generated by a hidden Markov model, whose states represent concepts. The parameters of the model are estimated from a set of manually annotated (training) images. Each image in a large test collection is then automatically annotated with the a posteriori probability of concepts present in it. This annotation supports content-based search of the image-collection via keywords. Various aspects of model parameterization, parameter estimation, and image annotation are discussed. Empirical retrieval results are presented on two image-collections | COREL and key-frames from TRECVID. Comparisons are made with two other recently developed techniques on the same datasets.

References

[1]
A. Amir et al. IBM Research TRECVID-2003 Video Retrieval System. In Proc. TRECVID2003, November 2003.
[2]
K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. M. Blei, and M. I. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003.
[3]
D. M. Blei and M. I. Jordan. Modeling Annotated Data. In 26th Annual International ACM SIGIR Conference, pages 127--134, 2003.
[4]
P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In Seventh European Conference on Computer Vision, volume4, pages 97--112, 2002.
[5]
S. L. Feng, R. Manmatha, and V. Lavrenko. Multiple Bernoulli relevance models for image and video annotation. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages II--1002--II--1009, 2004.
[6]
G. Iyengar et al. Joint Visual-Test Modeling for Multimedia Retrieval. Available at: http://www.clsp.jhu.edu/ws2004/groups/ws04vstxt/, 2004.
[7]
J. Jeon, V. Lavrenko, and R. Manmatha. Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. In 26th Annual International ACM SIGIR COnference, pages 119--126, 2003.
[8]
V. Lavrenko, S. L. Feng, and R. Manmatha. Statistical models for automatic video annotation and retrieval. In Proc. IEEE International Conf. on Acoustics, Speech and Signal Processing, volume 3, pages 17--21, May 2003.
[9]
J. Li and J. Z. Wang. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(9):1075--1088, 2003.
[10]
NIST. In Proceedings of the TREC Video Retrieval Evaluation Conference (TRECVID2003), November 2003.
[11]
L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77(2):257--286, 1989.
[12]
S. Young et al. The HTK Book. 2002.
[13]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages I--511--I--518, December 2001.

Cited By

View all
  • (2023)Image Captioning: A Comprehensive Survey, Comparative Analysis of Existing Models, and Research Gaps2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)10.1109/ICAISS58487.2023.10250630(1120-1127)Online publication date: 23-Aug-2023
  • (2020)A review on visual content-based and users’ tags-based image annotation: methods and techniquesMultimedia Tools and Applications10.1007/s11042-020-08862-1Online publication date: 9-May-2020
  • (2019)Privacy-aware Tag Recommendation for Accurate Image Privacy PredictionACM Transactions on Intelligent Systems and Technology10.1145/333505410:4(1-28)Online publication date: 12-Aug-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hidden Markov models
  2. image & video retrieval

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Image Captioning: A Comprehensive Survey, Comparative Analysis of Existing Models, and Research Gaps2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS)10.1109/ICAISS58487.2023.10250630(1120-1127)Online publication date: 23-Aug-2023
  • (2020)A review on visual content-based and users’ tags-based image annotation: methods and techniquesMultimedia Tools and Applications10.1007/s11042-020-08862-1Online publication date: 9-May-2020
  • (2019)Privacy-aware Tag Recommendation for Accurate Image Privacy PredictionACM Transactions on Intelligent Systems and Technology10.1145/333505410:4(1-28)Online publication date: 12-Aug-2019
  • (2019)A weighted KNN-based automatic image annotation methodNeural Computing and Applications10.1007/s00521-019-04114-yOnline publication date: 7-Mar-2019
  • (2017)Large-scale image annotation with image---text hybrid learning modelsSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-016-2221-z21:11(2857-2869)Online publication date: 1-Jun-2017
  • (2016)Soziale Medien in der empirischen ForschungHandbuch Soziale Medien10.1007/978-3-658-03765-9_21(389-408)Online publication date: 6-Oct-2016
  • (2015)Optical flow-based representation for video action detectionEmerging Trends in Image Processing, Computer Vision and Pattern Recognition10.1016/B978-0-12-802045-6.00021-1(331-351)Online publication date: 2015
  • (2015)Soziale Medien in der empirischen ForschungHandbuch Soziale Medien10.1007/978-3-658-03895-3_21-1(1-19)Online publication date: 12-Nov-2015
  • (2014)BibliographySemantic Multimedia Analysis and Processing10.1201/b17080-21(421-512)Online publication date: 18-Jun-2014
  • (2014)Content factors segmentation with CBIR in real world2014 11th International Computer Conference on Wavelet Actiev Media Technology and Information Processing(ICCWAMTIP)10.1109/ICCWAMTIP.2014.7073412(297-300)Online publication date: Dec-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media